🌱 Seedling noteworthy

The questions to ask when considering if AI content scraping should qualify as fair use

posted on in: Notable Articles, tech and ai.
~949 words, about a 5 min read.

Series Listing (click to open)
  1. Generative AI is Bad For Us:
    1. Stop using generative AI as a search engine
      December 6, 2024
    2. Defining AI as a political project
      December 9, 2024
    3. The phony comforts of the AI industry's useful idiots
      December 9, 2024
    4. The questions to ask when considering if AI content scraping should qualify as fair use
      December 19, 2024

I think the argument AIs get the 'right to read' like humans illustrates why this is wrong. Why should AIs get the right to read MORE freely? If AIs can scrape the web then every thing should be personally & freely accessible to me and I should be able to make, save, & post as many copies as I want.

(This discussion originates from a BlueSky thread)

Obviously, incentivizing creators by allowing them to make money from their work accomplishes this goal. But I think the framers of the fair-use principle understood that if copyright was solely about compensating creators — if no one could use even a small part of an existing work without asking for permission and paying money — artistic and intellectual creativity could be impeded. As the Supreme Court noted, fair use was designed to counterbalance cases that might “stifle the very creativity which [copyright law] is designed to foster.” The point is that copyright law is designed for the benefit of society as a whole, not just as a way for creators to make money. The whole point of the “transformative” test is to decide whether the other aspects of an infringing use compensate for or counterbalance the obvious infringement.

The problem is that Fair Use is a good concept, but it is insane to those of us who have paid attention to scraping for years that somehow AI gets a fairer Fair Use while others don't get to use the same techniques and argument for far more culturally important reasons like archiving or review. The Internet Archive gets sued but rapacious climate-burning culturally useless mechanically questionable systems aimed at eliminating employment and capturing capital don't? I would argue that the problem with copyright is exactly this:

If copyright is for anything, it is for making sure we have a healthy cultural & artistic landscape. The Internet Archive helps and encourages it. So do YouTubers trying to post reviews. But does AI? Arguably, AI is the very cultural-negative-impact that copyright is supposed to defend against!

Mathew Ingram argues that the example of positive impact is "Already, studies have found that people prefer AI-generated poetry to the real thing." but... so what? Things people like isn't cultural impact. 'People like it' is an extremely poor argument for copyright infringement or plagiarism. The problem, which AI highlights, is both copyright & fair use as they are enacted in the modern United States of America are fundamentally broken concepts. Enforced at the whims of the powerful to abuse the poor, weak, and underprivileged. AI is just one more powerful entity set to abuse creators.

Is AI "transformative"? WHO CARES?! That is NOT the REAL test that is applied outside of ivory towers. The thing that actually, in reality, decides what is or is not copyright infringement is if it inconveniences the powerful. Let's not pretend otherwise. AI is a project to further empower the powerful, collect capital in the hands of those who already have too much money, and remove employment from the very people who are building and lovingly creating our culture and art. If copyright was intended for anything, it was to stop that. I can go on & on about how AI outright copies & plagiarizes. How it isn't actually transformative. How its rapacious growth isn't proving out to be truly useful. But none of that is relevant. That is an abstract argument that separates the question of law from what the law is for.

Copyright law is desperately in need of change & reform in the internet age. It was never great, but it is even poorer as a defense of creatives now. Spending time & effort to argue for AI's ability to bypass it is just reinforcing everything that's wrong w/copyright. It is the wrong path.

Fix copyright law for the rest of us before you try and fix it for AI. To play into the prioritization of Silicon Valley plutocrats is to doom the rest of us, not just because AI is burning the world (though that's a good reason), but because it continues their atrocious hoarding of power and wealth. Aaron Swartz is dead, and he's the one we know best. There are many others who--because of how we enforce copyright and disempower artists and archivists--have had their lives ruined, destroyed, disgraced and made worse by doing the very thing that suddenly everyone is standing up for AI to do.

These are not abstract academic questions. These are our lives, our art, our culture, &--for the artists being stolen from--matters of survival. It's dishonorable to pretend AI is some special case & it is disrespectful to pretend it is separate from questions of power & capital it attempts to hide.

Don't ask if AI is fair use. Ask if it is fair that everyone else who claimed fair use doesn't get the same consideration. Ask if the artists currently alive should have some say in how their work is used. Ask if being an artist should be enough to live on. Ask about power, who gets it and how.

(Additional source on this topic available via a context page.)

I couldn't disagree with this more, not on the technicalities, but because it misses the fundamental point of the question about what AI systems should be freely allowed to do:

Are there cases where an AI engine might produce an obvious copy of a copyrighted work? Of course, in which case fair use likely wouldn’t apply. But the ingestion or indexing of all that content should be considered fair use, because of the benefits that AI could generate in non-infringing ways.



— Via Mathew Ingram, Why AI content scraping should qualify as fair use
Page History

This page was first added to the repository on December 19, 2024 in commit a3010b83 and has since been amended once. View the source on GitHub.

  1. AI Content scraping title change
  2. Noteworthy addition