A free llms.txt validator

2026-07-02

turva.dev now has a free llms.txt validator at https://turva.dev/llms-txt-validator. Enter a domain and it fetches that site's /llms.txt, checks the structure against the format and reports each check as pass, warn or fail. Nothing is stored and there is no signup.

What the format asks for

llms.txt is a small format, and that is the point of it. One H1 line names the site. A blockquote under the title carries a one line summary. H2 sections group markdown links an agent can follow to the content itself. A file that follows this shape gives an agent a map of the site at a fraction of the cost of crawling it.

What the validator checks

The file exists at /llms.txt and answers HTTP 200
The response is plain text, not an HTML page
The first non-empty line is an H1 title
A blockquote summary follows the title
H2 sections group the content
Markdown links parse and use absolute URLs
The file stays small enough to be cheap to read

The second check earns its place. A site that returns its 404 page with status 200 looks like it has an llms.txt until something actually reads it, and an agent that fetches markup where it expected markdown wastes its tokens on tags.

Agents can use it too

The same URL answers JSON. Send Accept: application/json with a url parameter and the checks come back as data, so the validator works in a script or an agent pipeline as well as in a browser:

curl -H "Accept: application/json" "https://turva.dev/llms-txt-validator?url=example.com"

One build note

The first deploy failed its own self check. A Cloudflare Worker cannot fetch a URL on its own zone, so asking the validator about turva.dev started a request that could never return and timed out after eight seconds. The fix reads the same constant that serves /llms.txt instead of fetching it. External domains are fetched normally, and the validator was proven against the llmstxt.org file before this post went out.

What it is not

The validator reads one file and checks its shape. It does not measure whether agents can discover the site, read its pages as markdown, find its API or complete a purchase. That is audit territory, and an audit here runs a site against two independent scanners rather than one checklist.

For an audit of the whole surface an agent sees, not just this one file, contact info@turva.dev.