What's New: The AI-Friendly Checker, Smarter Meta Audits and Canonical Checks

We've shipped a run of updates over the last few weeks, and rather than let them disappear silently into production, we want to start writing them up. This is the first of what will be a regular changelog: the fixes and features that are already live, what they do, and why we built them.

Most of this batch points in one direction — making the tool useful in a world where AI assistants, not just search engines, read your pages. But there are some sharper audit checks in here too. Everything below is live right now; point the scanner at your sitemap and you'll see it.

The big one: an AI-Friendly Checker

The headline addition is a brand-new tool: the AI-Friendly Checker. It scores how well each page on your site can be read, understood and cited by AI crawlers and answer engines like ChatGPT, Perplexity and Google's AI Overviews.

The key insight it's built around is that most AI crawlers don't run JavaScript — they read the raw HTML your server returns. So the checker fetches each page the same way, without executing scripts, and scores what it can actually extract. Each page gets a 0–100 readiness score broken down across the signals that matter:

Extractable content — how much real text is in the raw HTML. A near-empty body is the biggest red flag, because it usually means the page is client-rendered and invisible to crawlers that don't run JavaScript.
Structured data — whether the page ships JSON-LD (schema.org) for machines to lift clean facts from.
Title and description, heading outline, and a semantic <main> — the structure a model uses to parse and represent your page.
A declared language and a freshness signal — so assistants can answer in the right language and favour current content.

Click any score and it expands into a per-check breakdown so you can see exactly what passed, what's borderline, and what failed. If you want the full background on why these signals matter, we wrote a companion guide on generative engine optimisation.

Site-wide AI signals too

Two things that matter for AI can't be seen on any single page, so the checker handles them once per scan at the domain level. It reports whether you publish an llms.txt (a curated map for language models), and whether your robots.txt allows the major AI crawlers — GPTBot, ClaudeBot, PerplexityBot, Google-Extended and others — or blocks them. Plenty of sites block AI bots by accident in a copied robots file; now you can see at a glance whether yours does.

Smarter meta audits at a glance

The meta audit got a usability upgrade that we'd wanted for a while. Previously, when you ran a meta tags audit, the results table just showed a flat "Yes" to say metadata had been collected — which told you nothing about whether it was any good.

Now that column is a colour-coded summary chip for each page: green when everything's clean, amber when there are warnings, red when something critical is missing. Click it and it expands into a full per-field breakdown — title, description, canonical, robots, Open Graph and Twitter — each with its own status and a short note (for example, "Title — 72 chars, over the 60 limit"). It's the difference between knowing metadata exists and knowing whether it's actually working, without having to export a CSV to find out.

Canonical checks that actually catch problems

This one started as a gap a user spotted, and they were right. The scanner was reading each page's <link rel="canonical"> but never actually checking it — so a missing canonical, or one pointing at the wrong URL, passed completely silently.

That's fixed. The meta audit now validates canonicals two ways. First, presence: pages with no canonical tag get flagged. Second, and more importantly, self-reference: if a page's canonical points at a different URL than the page itself, we flag it — because that's how a page quietly hands its ranking signals to another address, or disappears from the index entirely. The comparison is deliberately forgiving about the things that don't matter (an http-to-https upgrade, a trailing slash, a dropped tracking parameter all count as self-referential) so you only see the mismatches that are genuinely worth investigating.

We made the site itself AI-friendly

In the spirit of eating our own cooking, we also applied the advice to ourselves. SeoSitemap.app now publishes its own llms.txt, and our robots.txt explicitly welcomes the major AI crawlers. It's a small thing, but it would be a bit awkward to ship an AI-friendliness checker and score badly on our own tool — and now we don't.

Getting the data out

A quick reminder on exports, since they pair well with the new checks. Every scan can be exported three ways: a full per-URL CSV inventory, an issues-only CSV with one row per problem (handy for filtering and assigning work in a spreadsheet), and a shareable HTML report. The AI readiness score and its per-check results now ride along in the CSV export too, so you can track readiness across a whole site over time.

If you're auditing a large site, remember the scanner pulls every URL from your sitemap and audits up to 500 pages per run — so a complete, valid sitemap is the thing that determines how much of your site actually gets checked.

Key takeaways

The new AI-Friendly Checker scores every page for how readable it is to AI crawlers — extractable text, structured data, semantics and freshness — plus site-wide robots.txt and llms.txt checks.
The meta audit results now show a colour-coded summary chip per page that expands into a full per-field breakdown, instead of a flat "collected" flag.
Canonical tags are now validated for presence and self-reference, catching missing canonicals and ones that point at the wrong URL.
We made our own site AI-friendly with an llms.txt and explicit AI-crawler access in robots.txt.
Exports now include the AI readiness score, alongside the existing CSV inventory, issues CSV and shareable HTML report.

What's New: The AI-Friendly Checker, Smarter Meta Audits and Canonical Checks

The big one: an AI-Friendly Checker

Site-wide AI signals too

Smarter meta audits at a glance

Canonical checks that actually catch problems

We made the site itself AI-friendly

Getting the data out

Key takeaways

Put this into practice

Related articles

Title & Meta Description Length: What Google Actually Shows

What Is GEO (Generative Engine Optimization)? How to Make Your Site AI-Friendly

Image Alt Text: Writing It for SEO and Accessibility