LEAKED: A New List Reveals Top Websites Meta Is Scraping of Copyrighted Content to Train Its AI (Including Many Fediverse Instances!!!)
-
LEAKED: A New List Reveals Top Websites Meta Is Scraping of Copyrighted Content to Train Its AI (Including Many Fediverse Instances!!!)
"The tech giant is sidestepping guardrails that websites use to prevent being scraped, data show, in a move whistleblowers say is unethical and potentially illegal."
ARTICLE: https://www.dropsitenews.com/p/meta-facebook-tech-copyright-privacy-whistleblower
FULL PDF: https://www.dropsitenews.com/api/v1/file/b3555944-e204-4f5e-9a64-e44281b19a82.pdf
INSTANCES KNOWN TO HAVE BEEN SCRAPED BY META INCLUDE:
• mastodon.social
• mastodon.online
• tech.lgbt
• hackers.town
• chaos.social
• mastodon.org.uk
• mastodont.cat
• mastodon.de
• mastodon.xyz
• mastodon.coffee
• mastodon.cloud
• mastodon.scot
• mastodonapp.uk
• mastodon.green
• mastodon.ml
• mastodon.au
• mastodon.eus
• mastodonczech.cz
• mastodon.sdf.org
• mstdn.social
• troet.cafe
• techhub.social
• tchncs.de
• kolektiva.social
• mamot.fr
• defcon.social
• meow.social
• social.linux.pizza
• ioc.exchange
• eldritch.cafe
• yiff.life
• furry.engineer
• infosec.exchange
• blahaj.zone
• woof.group
• union.place
• queer.party
• sakurajima.moe
• pawb.social
• digipres.club
• journa.host
• corteximplant.net
• corteximplant.com
• octodon.social
• bitbang.social
• jorts.horse
• tenforward.social
• pnw.zone
• spore.social
• hear-me.social
• neuromatch.social
• vt.social
• cosocial.ca
• chitter.xyz
• tooter.social
• cloudisland.nz
• social.seattle.wa.us
• masto.es
• nobigtech.es
• mastodon.gal
• masto.host
• toot.community
• pony.social
• climatejustice.global
• pleroma.envs.net
• indiepocalypse.social
• anarchism.space
• disroot.org
• dragonscave.space
• toot.bike
• fuzzies.wtf
• norden.social
• beige.party
• ohai.social
• freeradical.zone
• metalhead.club
• treehouse.systems
• icosahedron.website
• sunbeam.city
• sunny.garden
• zeroes.ca
• ursal.zone
• chaosfem.tw
• mas.to
• mathstodon.xyz
• rubber.social
• todon.nl
• cupoftea.social
• nerdculture.de
• toad.social
there're definitely more, i just did ctrl+f when i thought of an instance name so i definitely missed some. will be editing this list to add them as i think of them
-
INSTANCES KNOWN TO HAVE BEEN SCRAPED BY META INCLUDE:
• mastodon.social
• mastodon.online
• tech.lgbt
• hackers.town
• chaos.social
• mastodon.org.uk
• mastodont.cat
• mastodon.de
• mastodon.xyz
• mastodon.coffee
• mastodon.cloud
• mastodon.scot
• mastodonapp.uk
• mastodon.green
• mastodon.ml
• mastodon.au
• mastodon.eus
• mastodonczech.cz
• mastodon.sdf.org
• mstdn.social
• troet.cafe
• techhub.social
• tchncs.de
• kolektiva.social
• mamot.fr
• defcon.social
• meow.social
• social.linux.pizza
• ioc.exchange
• eldritch.cafe
• yiff.life
• furry.engineer
• infosec.exchange
• blahaj.zone
• woof.group
• union.place
• queer.party
• sakurajima.moe
• pawb.social
• digipres.club
• journa.host
• corteximplant.net
• corteximplant.com
• octodon.social
• bitbang.social
• jorts.horse
• tenforward.social
• pnw.zone
• spore.social
• hear-me.social
• neuromatch.social
• vt.social
• cosocial.ca
• chitter.xyz
• tooter.social
• cloudisland.nz
• social.seattle.wa.us
• masto.es
• nobigtech.es
• mastodon.gal
• masto.host
• toot.community
• pony.social
• climatejustice.global
• pleroma.envs.net
• indiepocalypse.social
• anarchism.space
• disroot.org
• dragonscave.space
• toot.bike
• fuzzies.wtf
• norden.social
• beige.party
• ohai.social
• freeradical.zone
• metalhead.club
• treehouse.systems
• icosahedron.website
• sunbeam.city
• sunny.garden
• zeroes.ca
• ursal.zone
• chaosfem.tw
• mas.to
• mathstodon.xyz
• rubber.social
• todon.nl
• cupoftea.social
• nerdculture.de
• toad.social
there're definitely more, i just did ctrl+f when i thought of an instance name so i definitely missed some. will be editing this list to add them as i think of them
-
-
@essjayjay @mods @FediPact This is the first I'm personally hearing of it, but you do have to understand that scraping does not have to be a consensual process and scrapers have been doing all sorts of shady stuff to hide themselves. I can't personally speak more on the topic. However, I have raised it to the team to draft a proper response.
-
@essjayjay @mods @FediPact This is the first I'm personally hearing of it, but you do have to understand that scraping does not have to be a consensual process and scrapers have been doing all sorts of shady stuff to hide themselves. I can't personally speak more on the topic. However, I have raised it to the team to draft a proper response.
@bluestarultor
@essjayjay @mods
@FediPactYou said scraping was legal. Presuming we're talking about the U.S.A. here, can you explain how that can be in a country that presumes everything I write defaults to being subject to my personal copyright?
-
@bluestarultor
@essjayjay @mods
@FediPactYou said scraping was legal. Presuming we're talking about the U.S.A. here, can you explain how that can be in a country that presumes everything I write defaults to being subject to my personal copyright?
At least so far, individuals haven't succeed in copyright claims against web scrapers. Here's a good article on the US legal landscape as of a couple of years ago (with the caveat that it's by somebody who sees scraping as generally a good thing) https://blog.ericgoldman.org/archives/2023/08/web-scraping-for-me-but-not-for-thee-guest-blog-post.htm From a privacy perspective, https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4884485 looks at the challenges.
-
At least so far, individuals haven't succeed in copyright claims against web scrapers. Here's a good article on the US legal landscape as of a couple of years ago (with the caveat that it's by somebody who sees scraping as generally a good thing) https://blog.ericgoldman.org/archives/2023/08/web-scraping-for-me-but-not-for-thee-guest-blog-post.htm From a privacy perspective, https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4884485 looks at the challenges.
@thenexusofprivacy @VictimOfSimony @essjayjay @FediPact Also literally no one in this thread said it was legal. XD
Even the original article notes that it's illegal to be slurping up copyrighted works, but that they failed to convince the judge of meaningful damages meriting restitution.
I said scraping is "not necessarily consensual" and that's because various sites have entered partnerships to sell off their users' creations with some half-assed nod to getting their consent.
-
@thenexusofprivacy @VictimOfSimony @essjayjay @FediPact Also literally no one in this thread said it was legal. XD
Even the original article notes that it's illegal to be slurping up copyrighted works, but that they failed to convince the judge of meaningful damages meriting restitution.
I said scraping is "not necessarily consensual" and that's because various sites have entered partnerships to sell off their users' creations with some half-assed nod to getting their consent.
Fair enough, I was just responding to @VictimOfSimony's question about scraping and copyright.
-
At least so far, individuals haven't succeed in copyright claims against web scrapers. Here's a good article on the US legal landscape as of a couple of years ago (with the caveat that it's by somebody who sees scraping as generally a good thing) https://blog.ericgoldman.org/archives/2023/08/web-scraping-for-me-but-not-for-thee-guest-blog-post.htm From a privacy perspective, https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4884485 looks at the challenges.
@thenexusofprivacy
@bluestarultor
@essjayjay
@FediPactThis article seems to think the problem is that a third party is asserting the copyright. The fact that these class actions are becoming more popular with first parties seems to suggest you're mistaken. Also, the trespass issue I mentioned remains since there is no implied right of access to chattels for an illegal purpose. There's a tort here.
-
Fair enough, I was just responding to @VictimOfSimony's question about scraping and copyright.
@thenexusofprivacy
@bluestarultor
@essjayjay
@FediPactWe do appreciate the response.
-
@thenexusofprivacy
@bluestarultor
@essjayjay
@FediPactThis article seems to think the problem is that a third party is asserting the copyright. The fact that these class actions are becoming more popular with first parties seems to suggest you're mistaken. Also, the trespass issue I mentioned remains since there is no implied right of access to chattels for an illegal purpose. There's a tort here.
There are quite a few class actions in process and it'll be interesting to see how things play out. And even though the plaintiffs in the Meta case didn't succeed, the court certainly left the door open to other attempts -- and arguably even encouraged them. https://www.technologyreview.com/2025/07/01/1119486/ai-copyright-meta-anthropic/ is a good overview of the Meta and Anthropic cases, and as they point out the wins for the tech companies are less cut-and-dried than they seem at first.
Still, even though the answer may be different at some point, right now I think it's still true that so far individuals haven't succeeded in copyright claims against scrapers.