No Hacks: Optimising the Web for AI Agents

215: The Agent-Broken Web - Why AI Can't See Your Website

Slobodan "Sani" Manić Episode 215

Your website might rank #1 on Google but be completely invisible to ChatGPT, Claude, and Perplexity. In this episode, let's break down why a huge chunk of the web is fundamentally broken for AI systems - not because of bad content, but because of technical decisions that made sense for humans but make sites invisible to the AI systems rapidly becoming the front door to the internet.

Chapter Timestamps

  • 00:00:00 - Introduction: The new game your website is losing
  • 00:01:43 - The Scale of the Problem: AI crawler traffic explosion
  • 00:05:19 - The JavaScript Problem: Why AI crawlers can't see your content
  • 00:10:28 - The Bot Protection Paradox: Accidentally blocking AI
  • 00:14:40 - The Speed Requirement: Why 200ms matters
  • 00:17:46 - AI Agents Are Struggling Too: Browser agents and their limitations
  • 00:20:46 - How to Fix It: 6 things you need to do
  • 00:25:33 - Closing: The web is adapting again

Key Statistics

  • 569 million GPTBot requests on Vercel's network in a single month
  • 370 million ClaudeBot requests in the same period
  • 305% growth in GPTBot traffic (May 2024 to May 2025)
  • 157,000% increase in PerplexityBot requests year-over-year
  • 33% of organic search activity now comes from AI agents
  • ~40% failure rate for the best AI browser agents on complex tasks

The 6 Things to Fix

  1. Implement Server-Side Rendering (SSR) - If your site uses a JavaScript framework (React, Vue, Angular) with client-side rendering, switch to SSR or static site generation immediately. Use Next.js, Nuxt, or a pre-rendering service.
  2. Add Structured Data with JSON-LD - Expose key information in machine-readable format using schema.org markup. Microsoft confirmed Bing uses this to help Copilot understand content.
  3. Optimize for Speed - Target server response time under 200ms. First Contentful Paint under 1 second. Largest Contentful Paint under 2.5 seconds.
  4. Check Your Bot Protection Settings - Review Cloudflare, AWS WAF, or your CDN's bot management. Make a deliberate decision about GPTBot, ClaudeBot, and PerplexityBot access.
  5. Kill Infinite Scroll and Lazy Loading for Content - Use paginated URLs with standard HTML links. Ensure high-value content is in the initial HTML response.
  6. Keep Sitemaps Current - Maintain proper redirects, consistent URL patterns, and fix broken links.

Tools Mentioned

Show Links

Sources Referenced in This Episode

AI Crawler Statistics:

JavaScript Rendering:

No Hacks is a podcast about web performance, technical SEO, and the agentic web. Hosted by Slobodan "Sani" Manic.

The Agent-Broken Web: Why AI Can't See Your Website
===

[00:00:00] Here's a fun little thought experiment for you. Let's say you've done everything right with your website. It's beautiful. You've invested in SEO. You're ranking on the first page of Google for your key terms, maybe number one. You won the game that everyone's been playing for the last 20 years. But there's a new game now in this new game, your website might as well not exist.

[00:00:26] I'm talking about AI, search ChatGPT, Claude, Perplexity, the AI agents that are increasingly becoming how people discover and interact with the web. A huge chunk of the web is fundamentally broken for these systems, not because of bad content, not because of poor SEO, because of technical decisions that made perfect sense for humans and maybe even traditional search engines, but they make your website so completely invisible to the AI systems that are rapidly becoming.

[00:01:01] I'd say the front door. Side door to the internet for most people. So today, let's talk about why this is happening, how bad the situation actually is, and what you can do about it. Before we do, I want you to go to no hack hacks pod.com. No Hacks, POD for podcast.com. Check out the new website and maybe go to the subscribe page and subscribe to our newsletter.

[00:01:30] I'm your host, Sani, and this is No Hacks.

[00:01:33] 

[00:01:43] Let's start with some numbers, because the scale of what is happening is frightening. Staggering. Let's look at Vercel. They're the company behind next js. They host. I would assume millions of websites. They published a deep analysis of AI crawler traffic on their blog in, in late 2025. So a few months ago what they found was, uh, interesting, open AI's GPTBot, that's their version of Google Bot.

[00:02:14] That's the crawler that feeds ChatGPT

[00:02:16] generated 569 million requests across their entire network in a single month. All coming from chat, GPT, Anthropic's ClaudeBot. That's their version followed with 370 million combined. That's close to a billion requests in a month. Just on Vercel's infrastructure alone, that represents 20 ish percent of what Google Bot has done in the same period on Vercel's Network.

[00:02:51] Now, you may be thinking, yeah, Google Bot's been around for decades. These AI crawlers are new and yes, but that's why the growth numbers are even more interesting. Let's look at CloudFlare as well, and CloudFlare is. They hold a huge chunk of the internet. They control a lot of traffic on the internet.

[00:03:11] Remember with CloudFlare went down last year and you couldn't do anything online. That's how big and important they are. They published their 2025 year in review recently, and they found that from May, 2024 to May, 2025, GPTBot traffic grew by 305% perplexity bot. Still tiny. And we're talking about small numbers, 157000% increase in raw requests from perplexity bot in a year.

[00:03:44] Starting point was tiny. Yes, I get it. But still, this is growth. We're watching the web and how it is consumed and. Analyzed and used transformed in real time. And this is not just about training data for the LLMs anymore. Cloudflare's analysis showed that over the past 12 months, about 80% of air crawling was for training purposes.

[00:04:11] But there's a shift to ai, user action, crawling bots that simulate human behavior, that actually do things on behalf of users, the exact thing this podcast is about, and that serves more than 15 times year over year. So the question for you is, why should you care about this? Well. According to BrightEdge internal tracking and search engine lens report on this, in their AI optimization guide, AI agents now account for roughly 33% of organic search activity. A third. That's probably not a rounding error. This is a fundamental shift about how people find and do things online that we are not even seeing the full effect of. So now we've established that AI crawlers are everywhere.

[00:05:01] They're growing fast, and they're increasingly driving real traffic and leading to real transactions. So now let me tell you. Why so many websites are completely, completely failing to capitalize on this.

[00:05:16] 

[00:05:19] Here's something that might surprise you if this is not what you do day in and day out. AI crawlers, all of those.

[00:05:27] GPTBot, ClaudeBot, perplexity, bot for the most part. They do not see JavaScript render content at all. I know. I know. We spent the last decade building increasingly sophisticated JavaScript application because we could react, view, angular, single page application that feels smooth and modern and responsive and look nice and work great kind of for the users.

[00:05:54] They mostly work for Google now because Google Bot is using a headless chrome browser that renders JavaScript. Even that with some limitations, delays, timeouts, but GPT bot, ClaudeBot, perplexity bot, all those LLM bots, they do not execute JavaScript at all. Zero. This is not speculation. This is not speculative knowledge.

[00:06:17] This is real pre-render dot io, a company that provides pre rendering services. They published an analysis where they tracked over half a billion GPT bot batches, half a billion with a B. They found zero evidence of JavaScript execution. Zero. Versal has confirmed the same thing in their analysis, and the quote that I think captures this whole thing is, unlike Google Bot, which uses a headless browser to render JavaScript, GPT bot, ClaudeBot and perplexity bot overwhelmingly consume raw UN rendered HTML to conserve, compute, and maintain low latency retrieval.

[00:07:00] Yeah, this is about the costs. This is not about GPT bot or Sam Altman and ADE hating JavaScript as a language. No, not executing and not rendering JavaScript is what makes this sustainable for them. If they had to run JavaScript, they would probably bankrupt by now or not crawl as many pages. What does this mean in practice?

[00:07:27] Well, this means that a website that is fully client site rendered. You give them a a huge JavaScript bundle and then the user downloads that in their browser and that is what renders the website. Your server doesn't do any of it. It just serves the bundle. It means that a website that operates like that may look blank for all of these LLM bots.

[00:07:51] Best case scenario, some parts are going to be missing if you're not using a bundle to render everything. In that case, none of the content would be rendered. And consumed by all those AI search platforms, blank. The beautifully designed feature rich website that takes seven seconds to load and three megabytes of JavaScript to show Hello world, nothing to an AI crawler.

[00:08:17] That's an empty page, but according to, to pre-render, it gets worse. By the way, according to pre-render io, if your website. Relies on JavaScript to load things like product schema, prices, availability, navigation, canonical text, all of those things that Google has been screaming that you should not be using JavaScript to load.

[00:08:38] All of that content is also invisible to AI systems, your interactive elements, popups, charts, infinite scroll content, anything behind clickable tabs that loads stuff on click blocked and invisible. That's bad. Search engine journal also, uh, had a report about this in, I believe it's called 2026 Enterprise, SEO Trends, and they reported major AI crawlers out of all the major AI crawlers, all only Google's, Gemini and Apple bought, actually bought it to render JavaScript or can render JavaScript.

[00:09:15] So if you want to exist in Chachi pt, cloud Perplexities world, you need to rethink your website's architecture. Oh, and about, about the infinite scroll I mentioned earlier. You know, the design pattern where you scroll to the bottom of the page and then yay, let's make it easy engagement and load more content when they reach the bottom of the page.

[00:09:34] It borrowed from social media feeds, obviously about that. Google's very on, Martin split addressed this directly in an interview with search engine Journal and he said. What does Google Bot do about infinite scroll? It doesn't scroll. Google bot doesn't scroll. If Google bot doesn't scroll, chat, GPT, ClaudeBot, perplexity bot absolutely do not scroll.

[00:10:01] So most of this dynamically loaded content is completely invisible to these crawlers. Completely invisible. So if your best content is below the fold, loaded on scroll behind a load more button, you might as well delete it. If you're worried about AI discovery, it is that serious and, and it, it's very binary.

[00:10:24] It either works or it doesn't.

[00:10:26] 

[00:10:28] So we have all those AI crawlers that cannot handle JavaScript, but what gets really interesting is that. CloudFlare also had an announcement that MMIT technology review covered that announcement. So CloudFlare said it would default to blocking AI bots from visited websites that hosts, so as a default, this, this was July, 2025, I think when CloudFlare went all in on, on, by default it's going to block AI bots.

[00:11:00] And look, I, I understand why. They gave you this option as, as someone hosting a website on CloudFlare and, which is what I do. Every single website I have is on CloudFlare and I love it for that edge. Hosting is the best. It's the fastest. I'm getting carried away. I understand why they did this. They, they, they have legitimate concerns about these AI companies scraping content without compensation and this is how they address this about training data, the whole ecosystem of AI models and how they get built.

[00:11:29] Yes, absolutely. Even stack overflows. Do you remember? Stack Overflow was a big deal for developers until it kind of collapsed and died. Their CEO was quoting that MIT piece and he said that community platforms like Stack Stack Overflow, like, like Quora, even Reddit, I guess they need to be compensated if the fuel LLLMs and give them training data.

[00:11:55] That's a fair point. But while we are. All debating the ethics of AI training and that, that's a big topic that should be, that cannot be covered in this episode, but it, it's extremely important. The practical reality of where we are today and what's happening in the world of Agent Web today is that millions of websites, because of this cl CloudFlare toggle are accidentally invisible to AI search and to AI agents, not because of a choice, but you know.

[00:12:27] The default has changed, and maybe there are websites that haven't updated their settings. CloudFlare before making decisions, they, they, they looked at robots, DXD across 3,900 of the top 10,000 domains, and what they found is that AI crawlers were the most frequently blocked user agents, GPT bot, ClaudeBot.

[00:12:55] They had the highest number of full disallow directives. This is from their blog posts about who's crawling your website, I believe was Title I, and I'll try to put all the episode, all the links into in, in the episode that was in description that they published in 2025. So this is not just explicit blocking.

[00:13:15] So this is picture, this scenario, for example, Google can crawl and index your pages. Just fine because Google Bot is explicitly allowed, but the newer AI user agents, they could be just blocked by default following these patterns. So you're visible on Google search, but the clients are asking you about, why am I not on chat, GPT?

[00:13:35] Why am I not on cloud? Why am I not in all those other systems? And maybe this is why the big retailers. They may be blocking AI agents on purpose. That's a different story. Amazon, Shopify, Walmart, they all have, they're all trying to protect their data basically, and they're all trying to not let Sam Alman train his baby on this.

[00:13:58] If you remember Amazon sent a cease and desist, this was October, I believe I mentioned this in my conversion hotel talk. Amazon sent a cease and desist. To a letter to perplexity, uh, about, uh, their AI shopping agent is not allowed to shop on Amazon unless it identifies as an agent. Perplexity literally called Amazon a bully.

[00:14:23] And yeah, there are some strategic choices here about resisting. Access that AI has to the content. But for most websites, it's not strategic. It's just something that happened that was accidental and that may be causing them visibility.

[00:14:38] 

[00:14:40] Not great, but let's talk about something that I love. I know quite a few people that do, but it is a very old fashioned thing that's becoming more critical than ever before.

[00:14:52] Speed of how websites are loading and how your webpage is loading. Kevin Indy, the author of Growth Memo newsletter on Substack and he's a very, very well respected SEO strategist. He published a piece called State of AI Search Optimization 2026, and he, he made some really important points about latency.

[00:15:11] So LM retrieval the process by which AI systems pull in web content to inform their answers. It operates under tight latency budgets during real time search. His recommendation for your server response time is that it should be under 200 milliseconds. Did you check your server and, and what kind of first byte your server response time are in that pause?

[00:15:41] There was like 10 times. Of 200 milliseconds. This, that's fast. It needs to be that fast. And why this matters is because slow responses prevent pages from entering the candidate poll 'cause they miss their retrieval window. The AI system needs an answer right away. If it's going to show up in, in, in that chat, in, in response, it needs to be super fast.

[00:16:05] If your server is, takes too long to respond, you will not be considered. And if you have consistency consistently slow response times, that might trigger crawl rate limiting, and the air crawler will learn that your website is slow, starts requesting your website less frequently, and it's a dead spiral for your AI visibility.

[00:16:28] So Kevin goes on to sites that data showing that the sites with less than one second low time receive. Three more three times more Google Bot requests than the sites that have server response time of over three seconds. One second is slow. Three seconds is catastrophic. And if you have a, let's say you have a WordPress website or a Flow website that takes a lot of time to generate the page and send it back, ah, you're in a world of trouble.

[00:17:00] Easy to solve, but you're in a world of trouble. By the way, if you wanna check any of these, I built and I, I released a free tool last week. It's called Glimpse. If you go to glimpse dot web performance tools.com, you can test your website. You'll get some speed insights, accessibility, how AI sees your website, how it looks with and without JavaScript.

[00:17:23] So glimpse dot web performance tools.com. Register, and please. Get in touch with me if you have any feedback. Now, if you've been putting off all those performance optimizations you have a reason to fix them. You have a reason to ask for the budget to fix them. That's all I'm saying. And let's talk about the AI agents for a second.

[00:17:46] 'cause this is not just about retrieval, it's not just about LLMs. It's not just about answers. Let's talk about AI agents, the ones that can see and interact with a website like a human. And you know, you may be thinking that that's how agents operate. They don't have any problems. They can do whatever they want.

[00:18:03] I'm talking about things like open AI operator and that uses what they call a computer using agents, CUA. It takes screenshots of the page, interacts with them, talks to DLM, and, and gets a response. You should click here, you should do this, you should do this. Just like a human would. So that sounds like the problems of latency and all that are solved.

[00:18:27] Yes and no. No, sorry. It's not even a yes and no. 'cause these things cost a lot of tokens and a lot of money. The more work. An AI agent needs to do. If it needs to look at your page, take a screenshot, analyze the screenshot, that's a cost, the operation of operation, and that's going to be an undesirable action for that agent.

[00:18:52] Uh, browser use. That is a, a, a very popular open source framework for building broader way browser based AI agent and I, I absolutely love. Using them, building any kinda automation, anything you wanna do with, with an agent in a browser, browser use makes it super simple. And, um, they, they, they checked some, they, they ran some tests basically sending agents on websites and, uh, telling them to, to try to do things.

[00:19:20] And the success rate was. 80 something percent. So at 10 to 15% of those agents will fail without ever telling you. You will never know what happened unless you're tracking them somehow just because your website is too slow. So what does that mean for, for your website? So if you check out, process requires multiple steps, has confusing navigation, relies heavily on JavaScript interactions, presents any kind of friction.

[00:19:49] That's just, for sake of friction or, or because no one taught it through, the AI agents will struggle. And when an AI agent struggles, it doesn't try harder. It just gives up and goes somewhere else. This is a very new and a very competitive landscape. So if an AI agent is trying to complete a purchase on behalf of a user, and I believe this is the future we are heading towards, this is how internet interactions will look like.

[00:20:18] And if your website is difficult to navigate, the agent will take that transaction to a competitor, take its talents to South Beach. If you know what that reference is, lemme know. And it will go somewhere where a website is easier to work with. It will not debate it with you. It will not send you an angry email telling you that your checkout is broken.

[00:20:38] It will just go away and you will never know that it, this entire interaction happened.

[00:20:43] 

[00:20:46] So I've spent. Let me see what the time is. I spent 20 ish minutes talking about everything that's broken. And by the way, glimpse dot web performance tools.com. Register for a free account

[00:21:00] this is a free tool released by my company. Go there and check your website. So I've spent 20 minutes telling you about things that are broken and. Let's talk about how you can fix it. Let's talk about a few things you can do to make things easier for AI agents. Number one, if your website is built with a JavaScript framework that is client site rendered, you need to implement server site rendering or static site generation today immediately.

[00:21:31] Immediately. You need to do that. So maybe you can use something like Next JS for React, and that's what. I'm saying it again, No Hacks. pod.com is running. You can do something like that. You can pre-render your website. It'll be significantly faster and easier to consume. This is not a marginal improvement.

[00:21:50] You'll see this is a transformational change if your website is currently client side rendered. Number two, structure data. Just use JS ld, use schema and. Yeah, you need to expose all of your key information in your format that machines can easily parse. Google recommends JS LD in a script tag that is clean.

[00:22:13] It's separate from your HTML. The information must match what's in HTML because if a, if the system recognizes your website is trying to trick it by giving a different schema and telling the human user something completely different, you won't have a lot of fun in the future. So just to implement schema.

[00:22:34] Use J LD format and, uh, even, even Microsoft has confirmed that Bing uses schema.org markup to help copilot understand content. Perplexity, chat bot, uh, chat, GPT cla, they all benefit from well structured data. Think of it as writing a, a clear summary of your page that has to match the information in a page.

[00:22:57] Okay. This is very important alongside your content that you have for humans. Number three, my favorite speed. Your time to first bite should be under 200 milliseconds. This is not easy. You need a lot of caching. You need good server configuration, all kinds of things. But if your server does not respond, I'm not talking about loading the entire page.

[00:23:24] In 200 milliseconds, I'm talking about painting a server, and then it takes less than 200 milliseconds to say, Hey, here's what it's, just load this. Your first Contentful paint, FCP should be less than a second, LCP. Largest content will paint the core web vital. By the way, the first episode of this podcast was about core web vitals.

[00:23:49] February 18th, 2021 full circle moment. The LCP value should be under 2.5 seconds. This is Google's recommendation, not mine. These metrics matter for users. They matter for Google, and they matter for AI retrieval. Number four, check your bot protection settings. So look at your CloudFlare A-W-S-W-A-F, whatever, CDN, whatever security service you're using.

[00:24:18] Look at the bot management tools and make a decision about whether to allow. GPT bot, ClaudeBot, perplexity, bot all the others or not. Do not live by the defaults. Make a deliberate decision and live with it. Number five, infinite scroll and lazy loading of content. Gone yesterday. Please. I, I have nothing to explain here.

[00:24:44] Just kill it and do it properly. Number six, site maps, keep them up to date. If you have a CMS, hopefully it updates if you're, if you're using WordPress and you're using something like Yost or whatever other SEO plugin, you're fine, but your site maps need to be up to date. And, uh, what they do is they tell AI crawlers how frequently they should attempt to fetch your pages.

[00:25:09] So maintain proper redirects, consistent URL patterns. Be careful about broken links. Just be, have a healthy website. I don't think that's too much to ask for. So in closing, and this is the thing I I, I keep coming back to, for the last six to nine months, ever since I just kind of started getting obsessed by this agentic web shift, the web was built for us.

[00:25:33] It was built for humans, then it was adapted for search engine crawlers, Google Bot. Everyone knows Google Bot. And that took years That created. Multiple industries around SEO technical. SEO just was a fork of SEO, basically, or web development. I, I dunno what it's, we need to adapt again. We need to adapt how we build websites for AI, crawlers and AI agents.

[00:26:00] Now, the technical requirements are different, but they are really at their core, very similar. So no JavaScript execution. To display, key content, latency budgets, even tighter. You're blocking behaviors. You need to be very careful about them. But the core principle of all this is the same. If you want to be discover, you need to be easily accessible to the systems doing the discovery covering.

[00:26:32] The really good news about all this is that, as I said earlier, most of these fixes, server side rendering structural data speed optimization. They improve Traditionalized CO, they improve user experience. So you're not choosing between optimizing for AI and optimizing for humans. You're building a more robust, accessible website that's going to work better for everyone involved and the websites that figure out how to do this.

[00:26:59] Now, while everyone is still debating whether AI search matters or whatnot, those are the websites that are going to dominate the next era of the web. So your homework is go look at your website right now. View the page source what's there before JavaScript runs. Check your server logs for AI crawler user agents and see what, what are the response calls, codes they're getting.

[00:27:24] Review your CDN settings, AI bot blocking, and for God's sake, just kill lazy loading and an infinite scroll if you have it. You might be surprised by what you find, but now you have a better idea of what to do about it. And if you need help with any of these, just get in touch with me. I can help you. We can start with an audit, we can see what to do next.

[00:27:46] But if you're struggling with any of these things, just please get in touch with me and let's see what we can do. That's it for today. I'm your host, Sani. This was another episode of No Hacks. Don't forget to check out the new website, No Hacks pod.com, and leave some feedback. I'll talk to you next week.

[00:28:06] Oh

[00:28:34] no. Hatch,

[00:28:53] no hatch.

[00:28:58] Go back.


Podcasts we love

Check out these other fine podcasts recommended by us, not an algorithm.

From A to B Artwork

From A to B

Shiva Manjunath
Product for Product Management Artwork

Product for Product Management

Matt Green & Moshe Mikanovsky