Report · 2026-05-20

State of AEO 2026: We audited 32 DTC stores. Here's what AI agents actually see.

Median AEO Score: 58. 81% of Shopify stores already publish a UCP manifest by default — but 0% publish signing keys. Skims scores 8% on content and 100% on cross-agent recall.

For store owners — what the data says

→ Most stores have the technical basics. 81% of audited stores can already be discovered by AI shopping agents at a protocol level — usually because Shopify ships the manifest by default.
→ But the content layer is thin. Average content score is just 37.2%. Product pages routinely strip out the dense factual signals AI shopping assistants actually quote — concrete numbers, customer quotes, citations to press.
→ Cross-agent recall is the wild card. Average is 77.3% but variance is huge between brands. Some categories score 90%+; others 10%.
→ Most fixes take 1–2 hours. The leverage isn't in re-platforming — it's in adding structured data, a few concrete numbers, and recent third-party citations to your existing pages.

We ran every store through the same four-layer audit our public scanner uses — protocol manifest probe, content-signal scan, identification across GPT, Claude and Gemini, plus a cold-start visibility query. Five stores blocked our crawler outright; that itself is a finding. The remaining 27 produced the dataset below.

The headline

81% of stores already publish a UCP manifest at /.well-known/ucp. Most don't know it — Shopify ships it by default for stores on the new platform tier. Agents can already discover and transact with them.
0% publish signing_keys. Every single store in our dataset is missing the JWK array that lets a UCP agent cryptographically verify the merchant's responses. This is the highest-leverage single fix in the entire industry right now.
Mean content score: 37.2%. Even stores with perfect protocol compliance routinely strip their product pages of structured data, statistics, and quoted social proof — the exact signals frontier models look for.
Mean cross-agent recall: 77.3%. When asked to recommend products in the store's category, three frontier models mention the brand a little over three quarters of the time on average. The variance between brands is enormous — see Travel below.

Finding 1 — Brand mind-share beats structured data, by a lot

The cleanest example is Skims. Its content score is 19% — almost no Product JSON-LD, no aggregateRating, no rich identifiers. By any traditional SEO metric the page looks abandoned. But GPT, Claude, and Gemini all recommend Skims when asked about the relevant category — visibility 100%, AEO Score 76. The frontier models have absorbed enough brand mentions from elsewhere on the web that they don't need the page to tell them what Skims is.

This pattern repeats. The strongest predictor of cross-agent visibility in our dataset is not the content score — it's how often the brand has been written about in the corpora the models were trained on. The implication is heavy for anyone outside the top tier: structured data fixes get you readable; only mind-share gets you recommended.

Finding 2 — Travel is in real trouble

Direct-to-consumer luggage brands have the worst cross-agent recall of any vertical we measured — average visibility of 11% despite an average protocol score of 90%. They are protocol-compliant but invisible. Ask any frontier model for the best carry-on under $300 and it answers with Tumi, Samsonite, Travelpro — the legacy names that dominate the training corpora. Away, Béis, and Calpak are competing against decades of luggage-industry mind-share, not just each other.

Finding 3 — Food & Beverage is the surprise category

Food & Bev turned in the highest average AEO Score of any vertical — 73.8. Magic Spoon (82), Olipop, Liquid Death and Athletic Brewing all pair clean UCP manifests with strong cross-agent recall. Our reading: these are brands that won novel categories outright (better-for-you cereal, functional soda, premium water, non-alcoholic beer), so the training corpora co-locate the brand name with the category — and the engineering teams have shipped the protocol layer too.

Finding 4 — Apparel has the inverse problem

Apparel posts the lowest average protocol score — 30% — but the highest average cross-agent visibility: 89%. Gymshark, Vuori, Bombas and Buck Mason don't expose a UCP manifest, so a Gemini agent literally cannot transact with them. They are, however, mentioned constantly in social posts and reviews, so when a shopper asks ChatGPT for activewear or basics, the brand name comes up. They are discoverable but un-buyable. If they ship UCP this quarter, they leapfrog most of this list.

Finding 5 — The signing-key gap

Every single store with a manifest is missing the signing_keys JWK array — the cryptographic primitive that lets agents verify a response was actually issued by the merchant and hasn't been tampered with. This isn't a content problem; it's an unfinished engineering checklist. The fix is roughly three lines of JSON, and it's the difference between an agent trusting your endpoint and an agent falling back to a safer competitor.

The bottom of the list

The lowest-ranked stores in our dataset are all in apparel: Buck Mason, Vuori, Cariuma sit between AEO scores of 42 and 34. None of them publish a UCP manifest. All of them are household-name brands. The protocol gap and the brand-recognition gap are running on different clocks — fix the protocol first, because that's the one you control.

What this means for a merchant reading this

Check your own /.well-known/ucp today. If you're on Shopify you probably have one. If you don't, that's your single highest-leverage fix.
Add a signing_keys JWK array. Three lines of JSON. Sets you ahead of 100% of the brands in this dataset.
Audit the content layer. Most stores in our dataset score below 50% on content — missing identifiers, statistics, quoted social proof. These are editable in your CMS, not in your stack.
Measure your cross-agent recall continuously. The visibility number moves week over week as the models update; static measurement is a snapshot, not a strategy.

Dataset, methodology and full per-store breakdowns are on the leaderboard. Re-run any store yourself on the home page.

Audit your own store

Same audit, your URL, in under a minute. Free, no account required.

Run my scan →