AIs and “stealing” knowledge — the collective is destroying the motivation for generating new knowledge. By Peter Girnus., who works for for Google’s Threat Intelligence Group.
Don’t steal from Google:
I published a report this month about “distillation attacks” — when outside actors query our models thousands of times to extract the underlying logic and replicate it.
We identified over 100,000 prompts from a single campaign. We called it “intellectual property theft.” We called it a “violation of our Terms of Service.” We said it “represents a form of IP theft” that we would disrupt, mitigate, and potentially pursue legal action against.
What Google stole from the world:
I need to tell you how we built the model they are trying to steal.
We scraped the internet. The entire internet. We crawled every website, every forum, every blog, every book we could digitize, every academic paper, every Reddit comment, every news article, every piece of creative writing that anyone ever posted anywhere.
We did not ask. We did not compensate. We did not attribute. We ingested the collective output of human civilization and called it a training dataset. …
We built Gemini on the commons. Every blog post, every open-source project, every Stack Overflow answer, every personal essay someone wrote at 2 AM — we ingested it, we processed it, we monetized it. The people who wrote those things did not receive an email. They did not receive a check. They received a subscription offer. …
Researchers found over 200 million copyright symbols in our training data. Publishers discovered that Gemini can reproduce entire chapters of their books verbatim. There are active lawsuits. Disney sent cease-and-desist letters. The European Publishers Council filed an antitrust complaint. A class action is expanding. A hearing is scheduled for May.
Double standard:
We called what we did “research.”
We called what they are doing to us “theft.”
I want to explain the difference.
- When we scrape the entirety of human knowledge without permission and use it to build a commercial product we sell for $20 a month, that is innovation.
- When someone queries our model 100,000 times through the API we provide to extract the reasoning we built from their data, that is a distillation attack.
The distinction is that we did it first. And we wrote the Terms of Service.
I should explain what “distillation” means. It is when someone takes the output of a mature model and uses it to train a smaller, cheaper model. The knowledge flows from the teacher to the student. We call this theft when it happens to us. We call it “knowledge distillation” when we do it to the open web. We even have a product page for it. You can distill Gemini, with our permission, using our tools, for a fee. You cannot distill Gemini without our permission. The underlying technique is identical. The difference is the invoice.
Legal both ways??
In December 2025, we sued a company called SerpApi for scraping our search results. In the same quarter, publishers sued us for scraping their books. We are simultaneously the plaintiff and the defendant in the same crime. The crime is copying. We have filed it under two different categories depending on the direction.
Still, AIs are great tools. But if compensating the generators of knowledge is not done fairly, generation of knowledge will wither. If someone can’t benefit from the effort and risk of research or origination, why bother?
I am reminded of pop music. Up until the Internet, rock stars made megabucks from record sales. But now that everyone copies music, the originators don’t make nearly as much. And hasn’t music become poorer for it, with few new or original tunes anymore. Where as in the five decades prior to 2000, there was a veritable torrent of new and interesting music, and there were rock/pop stars. Now: no reward, no motivation. Welcome to the collective, welcome to the longhouse.










