🌱 Seedling noteworthy

The Internet's Most Powerful Archiving Tool Is in Peril

⇥ Kate Knibbs, www.wired.com

posted on April 13, 2026 in: archiving, tech, ai, media and journalism.
~946 words, about a 5 min read.

—

A number of other major journalism organizations have also recently moved to restrict the Wayback Machine from archiving their stories, including The New York Times. According to analysis by the artificial-intelligence-detection startup Originality AI, 23 major news sites are currently blocking ia_archiverbot, the web crawler commonly used by the Internet Archive for the Wayback project. The social platform Reddit is too. Other outlets are limiting the project in different ways

I think this is all pretty bad, and I actually argued against us blocking the Internet Archiver bot at my own workplace. I lost. I understand why.

Journalism is writing the first draft of history, it is incredibly important, and it is--in the digital age--more transient than ever. The Internet Archive is performing an immensely valuable service, for free, to media companies that often can't be bothered to run their own archives. Some news sites will have stories that no longer work even as little as a year later. That context of our recent past is badly needed, and the Internet Archive helps preserve it.

But that's an aftereffect of the other side of this problem: media companies are struggling and--at a very fundamental level--in a battle for survival with AI systems who will attempt to crawl them and repackage and resell their content with no recompense.

AI companies are playing a decades-old game that has penalized creative people on the internet, media companies, and journalists of all stripes: aggregation. They are attempting to aggregate the entire web and, unlike previous eras, they are barely linking back.

Whatever you think about our machine learning overlord corporations, there's no doubt that they are sucking immense value from journalists and giving almost none of it back. The only tool media companies really have in their arsenal is blocking. These systems are in desperate need of source material, especially up-to-date professionally written news. By blocking crawlers and demanding licensing fees, the media companies get a little leverage back. If they don't block the Internet Archive then the bots crawl journalists' work from that platform.

The Internet Archive is an open platform, the archives must be open licensed, copy left, to even exist. It can't really move on its position either and it doesn't want to end up in the position of being a cop for some parts of its archives and not others.

This isn't an easy conflict to resolve, and I don't have a great answer. There are arguments to be made about some sort of conditional licensing maybe. Seems near impossible to enforce.

Other people believe in opening up media sites and hoping that links and AI Engine Optimization (AEO, the new younger sibling to SEO) will either lead to revenue or provide positive impact on their mission goals (especially true at non-profits). That's a nice idea, but I suspect it is naïve in the long run.

Though I love and support the Internet Archive, it is in serious danger. I don't think the web can really work the same way anymore. Not so long as AI companies metastasize like a cancer, growing fat on the mutated content of the rest of the web while strangling the flow of users and resources. The Internet Archive is in a position where it needs to consider an alternative approach to how things currently work.

Users of the internet also need to band together and archive sites for themselves. If it is important, we can't rely on the Internet Archive capturing it anymore. We need to use in-browser tools to capture it ourselves. I did it with this article here (you can find the link at the bottom of the page), though WIRED might have preferred if I didn't.

What advice to media companies desperately seeking leverage? Some of them are doing the only thing they can do right now. I think the other thing they should be doing is taking better care of their archives as well. Consider opening up or delivering archives of older content to IA and anyone else who wants them. Old articles have much lower revenue value to media companies and are harder and less likely to be hit by AI crawlers, so there might be a compromise there.

We need to do something to save the first draft of history and right now few journalism outlets are doing a decent job of that. The IA has the capability, we just need to figure out a new way of working with them. Perhaps we should consider a return to archives with local libraries that need a human with a library card to show up and access those archives in-person and in the library only? I'm not sure.

This is a tough situation, it really sucks the AI companies, who are worth billions based on sucking up the vast open web, are forcing the internet to close.

This week, advocacy organizations including the Electronic Frontier Foundation and Fight for the Future rallied journalists around the Wayback Machine’s cause. The coalition collected more than 100 signatures from working journalists who recognize the tool’s value and presented a letter of support to the Internet Archive. Signatories range from television mainstay Rachel Maddow to independent reporters like Spitfire News’ Kat Tenbarge and User Mag’s Taylor Lorenz. “In previous generations, journalists would turn to the physical archives of a local newspaper or of a local public library to access historical reporting and follow the threads of the present back into history,” the letter reads. “With many newspapers closed, and no clear path for local public libraries to preserve digital-only reporting, the work of safeguarding journalism’s record increasingly falls to the Internet Archive.”

—
— Via Kate Knibbs, The Internet's Most Powerful Archiving Tool Is in Peril (archived)