Architecture · 2026-05

What do AI shopping agents actually read on your page?

Three classes of agents — training crawlers, inference browsers, commerce-specific agents — each read different surfaces. A practical map of what fix affects which kind of agent, and where AEO and SEO actually conflict.

When a merchant asks "what should I optimize for AI agents," the honest answer is: it depends which agent. The category bundles three completely different access patterns, each with different rules. Mixing them up is the single biggest reason AEO advice on the internet contradicts itself.

Here's the actual map.

Class 1 — Training-time crawlers

These are the bots that build the model's knowledge. Examples: GPTBot, ClaudeBot, anthropic-ai, PerplexityBot, Google-Extended. They sweep the open web on a schedule, ingest pages, and feed the next training snapshot.

What they read: HTML body, JSON-LD, links, images-with-alt. They honor robots.txt strictly — adding them to your allowlist is what gets your page into the next model.
What moves the needle: Long-term brand recall inside the model. When a user asks ChatGPT "what's a comfortable everyday sneaker?" and the model answers from its training corpus, your inclusion in that corpus is what gets you named.
Time horizon: Slow. Effects show up in the next model snapshot — weeks to months.

Class 2 — Inference-time browsers

These are the bots that fetch pages live when a user is actively asking a question. Examples: ChatGPT browsing, Claude web search, Perplexity Sonar, Gemini grounding. They're not training; they're "the model with a browser, right now."

What they read: The rendered HTML body, Product JSON-LD, OG tags, prices on the page. They mostly don't check robots.txt — they treat the fetch as user-initiated, the same way your browser does when you click a link.
What moves the needle: Live discoverability. When a user prompt triggers the model to look up your page right now, can it extract a clean answer in the few hundred milliseconds it has? Dense factual HTML + Product JSON-LD wins here. Bloated marketing prose loses.
Time horizon: Immediate. Update the page, the next live query sees the new content.

Class 3 — Commerce-specific agents

These are the dedicated commerce surfaces. Examples: Shopify Agentic Commerce, Google Agentic Checkout, ChatGPT Shopping, Amazon Rufus. They're built around structured commerce protocols and have a strong preference for machine-parseable payloads over scraped HTML.

What they read first: /.well-known/ucp — the Universal Commerce Protocol manifest declaring services, capabilities, payment handlers, signing keys. If your origin doesn't respond at that path, you're invisible to the commerce surface even if your HTML is perfect.
What they read next: Product JSON-LD with full identifiers (brand, GTIN/MPN, availability, aggregateRating). HTML body is a fallback when structured data is missing.
Time horizon: Immediate, and high-stakes. These agents drive actual transactions, not just mentions.

The matrix

Surface	Training crawlers	Inference browsers	Commerce agents
HTML body	yes (main)	yes (main)	fallback
Product JSON-LD	yes (adds)	yes (primary)	yes (primary)
Open Graph tags	minor	yes	minor
/.well-known/ucp	no	no	yes (entry point)
llms.txt	emerging	no/maybe	no
robots.txt allow	yes (strict)	mostly ignored	ignored
Sitemap.xml	yes	no	no

What this means for "AEO doesn't fight SEO"

A lot of AEO advice tells merchants to rewrite their body copy into specification- sheet density — strip the marketing prose, pack in numbers and citations. That advice is technically correct for one of the three classes (inference-time browsers) but ignores what it does to SEO and brand voice.

The clean separation is:

HTML body is shared. Humans, Googlebot, and inference-time browsers all read it. SEO already optimizes this layer. Don't blow it up.
JSON-LD is shared too. Googlebot uses it for Rich Results, commerce agents use it as their primary feed, inference browsers extract it for fast answers. Adding fields here is purely additive — Googlebot doesn't penalize you for declaring more structured product data.
The sidecar layers — /.well-known/ucp, llms.txt, robots.txt AI-bot rules — don't exist in SEO output at all. No SEO tool produces them; no SEO ranking signal depends on them. Adding them can't hurt your search rankings because they aren't in the search-ranking graph.

So the answer to "will AEO hurt my SEO" depends entirely on which AEO recommendations you act on. Sidecar layers — zero risk. Body rewrites — real risk that needs judgement.

What we can and can't guarantee

Some things are observable and standardized today:

Product JSON-LD works across every class of agent we've tested. Schema.org is the closest thing to a universal commerce interface.
UCP is being adopted by commerce-specific agents. Shopify ships it by default; Google's Agentic Commerce framework consumes it.
Training-time crawlers honor robots.txt. Allowing GPTBot demonstrably puts you in the next OpenAI training pass.

What we can't guarantee:

llms.txt is not yet officially consumed by GPT or Claude. It's a proposed standard. We ship it because the cost is zero and the future-proofing value is non-zero, but don't expect it to move recall today.
Inference-time browsers mostly ignore robots.txt. Disallowing GPTBot does not hide you from ChatGPT browsing.
The agent landscape will keep changing. What we audit against today is the empirical 2026 snapshot, not a frozen standard.

How aeoprepared scores against this

Every scan we run instruments all four surfaces — UCP manifest, structured data, agent identification with the page in context, cross-agent recall without the page. The composite AEO Score isn't a single proxy; it's a weighted blend of all four because each maps to a different class of agent.

If you only had to remember one thing: JSON-LD is universal, UCP is the agentic-commerce entry point, llms.txt is bet-the-future, and HTML body is the layer both your SEO tool and AI browsers will keep reading together.

See what agents see on your store

Free audit across all three agent classes — protocol manifest, structured data, and cross-agent recall.

Run my scan →