LEAKED: A New List Reveals Top Websites Meta Is Scraping of Copyrighted Content to Train Its AI (Including Many Fediverse Instances!!!)

fedipact@cyberpunk.lol

INSTANCES KNOWN TO HAVE BEEN SCRAPED BY META INCLUDE:

• mastodon.social

• mastodon.online

• tech.lgbt

• hackers.town

• chaos.social

• mastodon.org.uk

• mastodont.cat

• mastodon.de

• mastodon.xyz

• mastodon.coffee

• mastodon.cloud

• mastodon.scot

• mastodonapp.uk

• mastodon.green

• mastodon.ml

• mastodon.au

• mastodon.eus

• mastodonczech.cz

• mastodon.sdf.org

• mstdn.social

• troet.cafe

• techhub.social

• tchncs.de

• kolektiva.social

• mamot.fr

• defcon.social

• meow.social

• social.linux.pizza

• ioc.exchange

• eldritch.cafe

• yiff.life

• furry.engineer

• infosec.exchange

• blahaj.zone

• woof.group

• union.place

• queer.party

• sakurajima.moe

• pawb.social

• digipres.club

• journa.host

• corteximplant.net

• corteximplant.com

• octodon.social

• bitbang.social

• jorts.horse

• tenforward.social

• pnw.zone

• spore.social

• hear-me.social

• neuromatch.social

• vt.social

• cosocial.ca

• chitter.xyz

• tooter.social

• cloudisland.nz

• social.seattle.wa.us

• masto.es

• nobigtech.es

• mastodon.gal

• masto.host

• toot.community

• pony.social

• climatejustice.global

• pleroma.envs.net

• indiepocalypse.social

• anarchism.space

• disroot.org

• dragonscave.space

• toot.bike

• fuzzies.wtf

• norden.social

• beige.party

• ohai.social

• freeradical.zone

• metalhead.club

• treehouse.systems

• icosahedron.website

• sunbeam.city

• sunny.garden

• zeroes.ca

• ursal.zone

• chaosfem.tw

• mas.to

• mathstodon.xyz

• rubber.social

• todon.nl

• cupoftea.social

• nerdculture.de

• toad.social

there're definitely more, i just did ctrl+f when i thought of an instance name so i definitely missed some. will be editing this list to add them as i think of them

#FediPact #meta #threads

fedipact@cyberpunk.lol

i'm gonna be editing that list as i think of more so be sure to view it directly on cyberpunk.lol to make sure you get the whole thingy

#FediPact #meta #threads

essjayjay@tech.lgbt

@mods

Is this true WRT to tech.lgbt?

@FediPact

bluestarultor@tech.lgbt

@essjayjay @mods @FediPact This is the first I'm personally hearing of it, but you do have to understand that scraping does not have to be a consensual process and scrapers have been doing all sorts of shady stuff to hide themselves. I can't personally speak more on the topic. However, I have raised it to the team to draft a proper response.

victimofsimony@infosec.exchange

@bluestarultor
@essjayjay @mods
@FediPact

You said scraping was legal. Presuming we're talking about the U.S.A. here, can you explain how that can be in a country that presumes everything I write defaults to being subject to my personal copyright?

thenexusofprivacy@infosec.exchange

At least so far, individuals haven't succeed in copyright claims against web scrapers. Here's a good article on the US legal landscape as of a couple of years ago (with the caveat that it's by somebody who sees scraping as generally a good thing) https://blog.ericgoldman.org/archives/2023/08/web-scraping-for-me-but-not-for-thee-guest-blog-post.htm From a privacy perspective, https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4884485 looks at the challenges.

@VictimOfSimony @bluestarultor @essjayjay @FediPact

bluestarultor@tech.lgbt

@thenexusofprivacy @VictimOfSimony @essjayjay @FediPact Also literally no one in this thread said it was legal. XD

Even the original article notes that it's illegal to be slurping up copyrighted works, but that they failed to convince the judge of meaningful damages meriting restitution.

I said scraping is "not necessarily consensual" and that's because various sites have entered partnerships to sell off their users' creations with some half-assed nod to getting their consent.

thenexusofprivacy@infosec.exchange

Fair enough, I was just responding to @VictimOfSimony's question about scraping and copyright.

@bluestarultor @essjayjay @FediPact

victimofsimony@infosec.exchange

@thenexusofprivacy
@bluestarultor
@essjayjay
@FediPact

This article seems to think the problem is that a third party is asserting the copyright. The fact that these class actions are becoming more popular with first parties seems to suggest you're mistaken. Also, the trespass issue I mentioned remains since there is no implied right of access to chattels for an illegal purpose. There's a tort here.

victimofsimony@infosec.exchange

@thenexusofprivacy
@bluestarultor
@essjayjay
@FediPact

We do appreciate the response.

thenexusofprivacy@infosec.exchange

There are quite a few class actions in process and it'll be interesting to see how things play out. And even though the plaintiffs in the Meta case didn't succeed, the court certainly left the door open to other attempts -- and arguably even encouraged them. https://www.technologyreview.com/2025/07/01/1119486/ai-copyright-meta-anthropic/ is a good overview of the Meta and Anthropic cases, and as they point out the wins for the tech companies are less cut-and-dried than they seem at first.

Still, even though the answer may be different at some point, right now I think it's still true that so far individuals haven't succeeded in copyright claims against scrapers.

@VictimOfSimony @bluestarultor @essjayjay @FediPact

The Nexus of Discussions

LEAKED: A New List Reveals Top Websites Meta Is Scraping of Copyrighted Content to Train Its AI (Including Many Fediverse Instances!!!)

LEAKED: A New List Reveals Top Websites Meta Is Scraping of Copyrighted Content to Train Its AI (Including Many Fediverse Instances!!!)

INSTANCES KNOWN TO HAVE BEEN SCRAPED BY META INCLUDE:

INSTANCES KNOWN TO HAVE BEEN SCRAPED BY META INCLUDE: