Skip to content
  • Categories
  • World
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Zephyr)
  • No Skin
Collapse
Brand Logo

The Nexus of Discussions

  1. Home
  2. Categories
  3. Uncategorized
  4. Discussion is growing about LLM bots and other tools scraping public posts.

Discussion is growing about LLM bots and other tools scraping public posts.

Scheduled Pinned Locked Moved Uncategorized
2 Posts 2 Posters 5 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • sw_isac@mastodon.iftas.orgS This user is from outside of this forum
    sw_isac@mastodon.iftas.orgS This user is from outside of this forum
    sw_isac@mastodon.iftas.org
    wrote last edited by
    #1

    Discussion is growing about LLM bots and other tools scraping public posts. This raises concerns about privacy, consent, and community safety. IFTAS has compiled community conversations, expert advice, and tools for identifying and blocking scrapers: https://connect.iftas.org/library/tools-resources/web-crawlers-and-scrapers/

    thenexusofprivacy@infosec.exchangeT 1 Reply Last reply
    • sw_isac@mastodon.iftas.orgS sw_isac@mastodon.iftas.org

      Discussion is growing about LLM bots and other tools scraping public posts. This raises concerns about privacy, consent, and community safety. IFTAS has compiled community conversations, expert advice, and tools for identifying and blocking scrapers: https://connect.iftas.org/library/tools-resources/web-crawlers-and-scrapers/

      thenexusofprivacy@infosec.exchangeT This user is from outside of this forum
      thenexusofprivacy@infosec.exchangeT This user is from outside of this forum
      thenexusofprivacy@infosec.exchange
      wrote last edited by
      #2

      Specifically with Meta, here's a post from @cuchaz with some tools to set up firewall-level blocking. https://gladtech.social/@cuchaz/115004304985099620

      Also, Meta's user agents include Meta-ExternalFetcher and Meta-ExternalAgent (which is the one they say they use for AI training). https://www.businessinsider.com/meta-web-crawler-bots-robots-txt-ai-2024-8

      And for instances that use a separate domain for media, make sure that you've got your robots.txt and firewall blocks in place for that instance as well -- I've heard from a couple of admins who recently realized they didn't.

      @sw_isac

      1 Reply Last reply
      Reply
      • Reply as topic
      Log in to reply
      • Oldest to Newest
      • Newest to Oldest
      • Most Votes


      Please keep the community guidelines in mind!
      • Login

      • Don't have an account? Register

      • Login or register to search.
      • First post
        Last post
      0
      • Categories
      • World
      • Recent
      • Tags
      • Popular
      • Users
      • Groups