News & Views
July 9, 2025

Cloudflare Changes the Rules for AI Crawlers

A shift toward permission-based website access for AI

A Shift Toward Permission-Based Access

Cloudflare announced a significant change to how AI companies interact with websites across its network. Cloudflare’s infrastructure supports nearly 20% of the internet’s sites, which is huge. Going forward, AI crawlers will be blocked by default unless they have explicit permission to access content. This change applies to new domains using Cloudflare’s services and is intended to give website owners more control over their content.

The broader context for this decision is the growing concern among creators, publishers, and platform operators that AI companies have been collecting data from websites without consent or adequate compensation. In the traditional search engine model, crawlers index content and direct users back to the source, driving traffic and revenue. AI-generated responses, by contrast, often extract content without attribution and present it directly to users, cutting the source out of the loop.

Cloudflare’s announcement moves the internet closer to a new model where creators and site owners freely give or withhold their content from AI systems. This policy shift reflects a growing demand for transparency and consent in the use of online content for training models.

Implications for AI, Developers, and Security

For AI companies, this change introduces both technical and economic challenges. Many large models were trained on massive datasets gathered from public websites. Now that access is being gated, companies will need to identify their crawlers, explain how the data will be used, and in some cases negotiate licensing terms. Brendan Purdy, a data scientist, notes that this may drive a move toward synthetic or highly curated datasets. He compares the shift to the introduction of GDPR.

“Overall, if this becomes the standard procedure, it will be a similar paradigm shift to GDPR, in the sense that, just as GDPR flipped personal information consent from opt-out to opt-in, Cloudflare is flipping AI crawling from ‘take IP unless blocked’ to ‘ask before taking IP.’ Unlike GDPR, which was driven by regulation, this is a market-driven technical enforcement. As such, Cloudflare’s enforcement mechanism is immediate and decisive…”

  • Brendan Purdy, Mathematician & Data Scientist

On the development side, the change may simplify protection against unwanted scraping. Developers have long dealt with bots trying to harvest content or interfere with site operations. Laura Berge, a software engineer and instructor, says that building protection into infrastructure is a welcome step.

“As a web developer, understanding how to protect your site is crucial, from intellectual property such as blogs and content sites to preventing fraudulent applications like those in the financial sector or other malicious actors, including bots. This is an area of knowledge that’s constantly evolving as bots become more sophisticated. Having some of this easily built into the website settings by the cloud provider sounds like a necessary and significant step forward.”

  • Laura Berge, Software Engineer, Educator

Cybersecurity teams will also need to adapt to these changes. Blocking bots by default raises the stakes for authentication and bot identity. Nathan Wendlowski, a cybersecurity specialist, sees this as a new front in the ongoing effort to secure digital systems.

“Treating AI bots as potential threats and requiring them to prove their identity aligns perfectly with modern cybersecurity principles. The hard part will be distinguishing between good bots (such as researchers) and business bots, and bad bots attempting to steal data. This is a challenge, but it’s the cat-and-mouse game that we cyber folks are always playing.”

  • Nathan Wendlowski, Cybersecurity Specialist, Educator

Wendlowski also highlights the benefit of requiring companies to organize and classify their data more clearly. To allow or deny access to specific bots, organizations need a stronger understanding of what they’re storing, its value, and who is attempting to use it.

A Step Toward a New Standard

Cloudflare’s new default setting is not a complete solution, but it represents a meaningful step toward rebalancing the relationship between content creators and AI developers. It gives creators and platform owners more control while placing new obligations on those who want to use web content for AI training or inference.

Giovanni DiFeterici, VP of Product at Flatiron, frames it as a necessary correction.

“Cloudflare is drawing a line around the idea that original content has value. It isn’t just something to be leveraged. Of course, this will likely raise the cost of developing foundational AI models. That’s unavoidable. But allowing AI to be built entirely on appropriated content isn’t a viable future.”

Whether other infrastructure providers follow Cloudflare’s lead remains to be seen. But the decision signals a clear shift in expectations. The default is changing. The burden of asking permission is no longer on creators, but on those who want to use their work.

Read Next Publication

/ Become a fellow

The Future Needs You.
Work on What Matters.

Gain real-world experience in an apprenticeship that pays for your degree.