Documentation

Everything you need to get the most out of SeoSitemap.app — how each scanner works, how to read the results, and how to fix the issues it finds.

Who this is for

SeoSitemap.app is built for anyone who needs to audit a website end-to-end, not just a single page: in-house marketers reviewing their own site, freelance SEO consultants vetting a client, developers checking their work before launch. The tools are intentionally simple to use, but the data they surface is the same data the big paid auditors charge for — heading hierarchy, meta tag health, link status, content cannibalisation, page weight.

The five tools

Each tool focuses on one category of SEO check. You can run them independently, and you can layer additional checks on any of them through the checkboxes above the scan button.

Sitemap Checker

The flagship tool. Paste any website URL or a direct sitemap.xml link; the scanner pulls every listed URL — following nested sitemap-index files and decompressing .xml.gz sitemaps automatically — and audits each page's H1–H6 hierarchy. Use it as your first stop on any audit; it gives you a sitewide map of which pages are healthy, which need work, and which are flat-out broken.

SEO Checklist

An interactive checklist of technical, on-page, and analytics best practices that your site should pass. Each item has a tick-box and your progress is saved in local storage, so you can come back days later and pick up where you left off. Use it as a pre-launch gate or a quarterly health check.

Meta Tags Checker

Focused audit of <title>, meta description, canonical, robots directive, and Open Graph / Twitter card tags across every page in your sitemap. Catches accidental noindex leaks, truncated titles, missing OG images, and inconsistent canonicals in one pass. The canonical check now validates both presence and self-reference — so a page whose canonical points at a different URL (quietly handing away its ranking signals) gets flagged, not just a missing tag.

Content Analysis

Looks at what is on the page rather than what wraps around it: word count per page, duplicate H1 or Title texts across the whole site (keyword cannibalisation), images missing alt attributes, and the lang attribute on the root <html> element.

AI-Friendly Checker

Scores how readable each page is to AI crawlers and answer engines (ChatGPT, Perplexity, Google's AI Overviews). Because most AI crawlers don't run JavaScript, the checker reads the raw HTML — the same thing they see — and gives each page a 0–100 readiness score across extractable content, structured data (JSON-LD), title and description, heading outline, a semantic <main> and a freshness signal. It also runs two site-wide checks once per scan: whether your robots.txt allows the major AI crawlers and whether you publish an llms.txt. See the GEO guide for the background.

How to read the results

Every scanner returns one row per analysed URL. The columns and colour coding follow the same convention across every tool, so once you've used one you've used them all.

Heading structure

Each row shows a compact summary of the page's heading shape (for example H1×1 · H2×3 · H3×5); the H1 marker turns red when the count is anything other than exactly 1, because every page should have a single H1 that summarises its content. Click show tree on any row — or Expand all at the top — to open the full heading tree: every H1–H6 in document order, indented by level, with skipped levels (an H2→H4 jump) flagged inline so you can see exactly where the hierarchy breaks.

Errors column

Issues found on the page, grouped into two severities:

  • Critical (red) — likely to hurt indexing or rankings directly: missing H1, missing title, missing description, 4xx/5xx status code, keyword cannibalisation.
  • Warning (yellow) — worth fixing but not always urgent: skipped heading levels, slightly-too-long title, redirect chains, missing alt attributes, slow TTFB.

You can filter the table to show only critical issues, only warnings, or every page with any issue at all — useful when you want to triage from worst to least bad.

Status and TTFB

When the "Link Health Status" checkbox is on, every row gets an HTTP status code (200 / 301 / 404 / 500…) and a Time-to-First-Byte measurement in milliseconds. TTFB is colour-coded: green under 500 ms, yellow up to 1 s, red above. Slow servers are one of the most under-diagnosed SEO problems, and this column makes them obvious.

Meta summary

With the meta audit on, the Meta column shows a colour-coded chip per page — green when everything's clean, amber for warnings, red when something critical is missing — rather than a plain "collected" flag. Click the chip to expand a per-field breakdown of title, description, canonical, robots, Open Graph and Twitter, each with its own status and a short note (for example, the exact title length when it's over the limit).

AI Ready score

With the AI Readiness check on, each row gets a 0–100 score chip: green from 80 up, amber from 50, red below. Click it to expand the per-check breakdown (extractable content, structured data, metadata, heading outline, semantic landmark, language and freshness) so you can see exactly what's holding a page back. The score is also included in the CSV export.

Features that make scans painless

The scanner is designed around the way real audits actually happen — you don't want to babysit a progress bar for ten minutes.

  • Background scanning. The scan keeps running as you navigate around the app. Switch to the documentation, leave a tab open, browse another tool — when the run completes, a toast notification tells you the report is ready.
  • Smart concurrency. We deliberately cap parallel requests at five. That keeps your origin server from being hammered, keeps us friendly to the sites we audit, and keeps individual scans well under serverless function timeouts.
  • Single-URL mode. Don't have a sitemap, or want to debug one specific page? Tick the "Analyze single URL only" box and paste the page URL directly. The same checks run, just on one URL.
  • Sortable, filterable table. Click any column header to sort. Use the filter dropdown to narrow down to critical-only or warnings-only. Pagination (50 rows per page) keeps the DOM responsive even on big scans.
  • CSV export. Hit "Export to CSV" to download everything visible in the table. The file uses semicolon delimiters and a UTF-8 BOM, so it opens cleanly in Excel and Google Sheets without garbled characters.
  • Private by default. All scan results live in your browser session only. Nothing is persisted on our servers, and a page refresh clears everything. Export to CSV if you want a permanent copy.

Best practices and quick wins

The patterns below are the ones that come back over and over again when auditing real sites. Fixing them rarely takes long and almost always lifts rankings:

  • Start with critical errors. Filter the results to critical only and clear that list first. Missing H1s, missing titles, and 404 pages in your sitemap have the highest impact for the least effort.
  • Hunt cannibalisation early. Two pages with the same H1 or title rarely outperform one strong page. When the Content Analyser flags duplicates, decide whether to merge, redirect, or differentiate them — don't leave them to fight each other.
  • Watch heading hierarchy. A skip from H1 straight to H3 isn't always fatal, but it's a signal that the page outline drifted as the template grew. Tidy it up: it helps screen readers, future editors, and Google.
  • Re-scan after every fix. The scanner is fast enough that you should treat it as a feedback loop, not a one-time audit. Fix something, re-scan, watch the warning disappear.

Limitations

A scan is capped at 500 pages per run so we stay inside our free-tier compute budget. Sitemaps with more URLs are truncated to the first 500. Closing the browser tab aborts the scan, because the orchestration runs client-side.

Troubleshooting

A few things to try when something doesn't go to plan:

  • "Failed to parse sitemap" error. Open the sitemap URL in a browser. If it returns a 404 or HTML page instead of XML, your sitemap link is wrong — try the canonical /sitemap.xml, or check /robots.txt for the real sitemap location.
  • All pages return 4xx or 5xx. The target site is blocking us as a bot. Some sites use Cloudflare or similar bot challenges that defeat any external scanner. There isn't a workaround on our side; we don't impersonate browsers.
  • Scan looks "stuck". We process pages in batches of five, so when a target server is slow, progress can pause for several seconds at a time. Give it a minute. If you genuinely need to bail out, hit Cancel — the scanner stops and shows what it already found.
  • "This URL points to a private address" error. The scanner refuses to fetch private / loopback / cloud-metadata addresses for security reasons. If you're trying to scan a staging site behind a VPN or on localhost, this won't work — deploy somewhere publicly reachable first.

👉 Was this tool helpful?