llms.txt — Complete Implementation Guide
Last updated: June 1, 2026
What this guide is and is not. This is the long-form reference for llms.txt — the convention, the structure, validation, and an honest section on what llms.txt is NOT. If you want a shorter task-focused page on common llms.txt patterns, see llms.txt Best Practices. The page's load-bearing differentiator is §3 "What llms.txt is NOT" — preempting the assumption that llms.txt is a ranking or visibility signal.
1. What llms.txt is
llms.txt is a community-proposed convention for a human-readable Markdown file at a website's root that lists the site's most important content for language-model consumers. The convention is published at llmstxt.org. It is not an IETF or W3C standard; it is a convention.
The file is plain Markdown. It typically starts with the site's name as an H1, a brief overview block, and a series of section headers (## Products, ## Docs, ## Articles, etc.) under which the relevant URLs are listed as bulleted links. The format is deliberately simple so a human can author it and an LLM-based tool can parse it without ambiguity.
The convention was proposed in 2024 by Jeremy Howard and gained adoption among technical-documentation sites, libraries, and SaaS products in the months following. The reference site llmstxt.org maintains the canonical spec and tracks adoption.
2. What it is for
llms.txt serves two related goals.
- Inventory clarity for LLM consumers. An LLM agent or tool that needs to understand a site's content can fetch one file and get a curated summary instead of crawling the full site. This is faster and produces more accurate results when the agent's job is to answer "what does this site have on topic X."
- Editorial discipline for the site operator. Maintaining llms.txt forces you to decide which URLs are top-level, which sections are public-facing, and which inventory matters. The exercise is useful regardless of LLM adoption.
helperg.com publishes /llms.txt as a worked example. It lists the ecosystem, products, docs, tools, articles, and comparison topics. See §5 for the structure.
3. What llms.txt is NOT
This section is deliberate. llms.txt is consistently misframed in community-circulated content. The honest framing is uncomfortable but necessary.
llms.txt is NOT a documented ranking signal for any major AI provider. Google, OpenAI, Anthropic, and Perplexity have not, as of the date this page was written, published a statement that they consume llms.txt as a ranking, retrieval, or training input. Adding llms.txt does not influence Google Search ranking. It does not increase the likelihood of citation by Perplexity, ChatGPT, or Claude in any documented way. The convention is useful for the reasons in §2, not as a ranking lever.
- It is not a standard. There is no IETF RFC or W3C specification for llms.txt.
- It is not a replacement for sitemap.xml. See §6.
- It is not a replacement for robots.txt. See §7.
- It is not consumed by every (or most) LLM-based tool today. Adoption is growing but not universal.
- It does not opt your content into or out of AI training. Crawl access is controlled by robots.txt; llms.txt does not change that.
- It is not a credential or proof of trustworthiness. Anyone can publish an llms.txt; the file makes no claim about authority.
If you are evaluating llms.txt, evaluate it for what it actually does — inventory clarity, editorial discipline, optional LLM-tool friendliness — and not for benefits no provider has committed to.
4. Structure and conventions
The llmstxt.org spec recommends a structure with these sections:
- Title (H1) — the site or organization name.
- Overview block (blockquote or paragraph) — one or two sentences describing what the site is and what readers will find.
- Topical sections (H2) — Products, Docs, Tools, Articles, etc. — under which top-level URLs are bulleted.
- Optional H3 subsections — for clusters within a topical section when the volume warrants.
A minimum viable llms.txt:
# EXAMPLE LLC
> EXAMPLE LLC is a small operator providing technical tooling. The site
> publishes a product catalog, technical documentation, and articles.
Site: https://example.com
Contact: hello@example.com
## Products
- Widget Pro: https://example.com/products/widget-pro
- Widget Mini: https://example.com/products/widget-mini
## Docs
- Quickstart: https://example.com/docs/quickstart
- API reference: https://example.com/docs/api
## Articles
- How widgets work: https://example.com/articles/how-widgets-work
- Choosing the right widget: https://example.com/articles/choosing
Keep the file focused. The point is to expose top-level surfaces, not to mirror your full sitemap.
5. Worked example — helperg.com's llms.txt
The current helperg.com /llms.txt demonstrates the convention applied to a multi-product operator. Excerpt:
# HELPERG LLC
> HELPERG LLC builds practical, lightweight, SEO-first digital tools...
Site: https://helperg.com
Contact: info@helperg.com
Operator: HELPERG LLC
## Ecosystem overview
- WebmasterID — analytics: https://webmasterid.com
- Cash Workspace — finance workspace: https://cashworkspace.com
- Mobile apps — see Products
## Products
- Products hub: https://helperg.com/products/
- PDF Editor: https://helperg.com/products/pdf-editor.html
- ...
## Docs
- Docs hub: https://helperg.com/docs/
- ...
## Free SEO tools
- Tools hub: https://helperg.com/tools/
- Meta Tag Checker: https://helperg.com/tools/meta-tag-checker.html
- ...
## Articles
- Articles hub: https://helperg.com/articles/
- ...
The file follows the llmstxt.org structure: title, overview, contact and operator strings, then topical sections. Length is bounded by what is genuinely top-level, not by the number of URLs the site has.
6. Relationship to sitemap.xml
sitemap.xml is the mandatory, standardized machine-readable inventory of crawlable URLs, published by sitemaps.org and consumed by every major search engine. Its purpose is enabling crawlers to discover URLs comprehensively.
llms.txt is a curated, human-readable Markdown summary intended for language-model consumers and humans reviewing what an LLM might see. Its purpose is editorial clarity, not exhaustive enumeration.
| Aspect | sitemap.xml | llms.txt |
|---|---|---|
| Standardization | sitemaps.org protocol | Community convention |
| Format | XML | Markdown |
| Audience | Search crawlers | LLM tools + humans |
| Scope | Comprehensive (every crawlable URL) | Curated (top-level surfaces) |
| Required? | Effectively yes for SEO | Optional convention |
| Indexable by Search Console? | Yes | No (not a search input) |
Both can coexist. They serve different audiences and do not conflict. See the sitemap.xml — Complete Guide for the sitemap reference.
7. Relationship to robots.txt
robots.txt is the RFC 9309 protocol for crawler access control. It gates crawl access. llms.txt is a content inventory. It advertises content. The two serve different purposes:
- robots.txt says "please don't crawl path X."
- llms.txt says "here is a curated list of important content on this site."
An llms.txt file does not override robots.txt. A crawler that respects robots.txt will still respect robots.txt regardless of llms.txt. The two files are independent. See the robots.txt — Complete Guide for the protocol reference.
8. Common mistakes
- Treating llms.txt as a ranking lever. It is not. Adding the file does not influence Google Search ranking or documented AI-provider behavior. Add it for inventory clarity, not for visibility hopes.
- Duplicating sitemap.xml. The two files have different audiences. llms.txt should be curated to top-level surfaces; sitemap.xml should be comprehensive. Copying every URL from sitemap into llms.txt is noise.
- Including non-public URLs. If a URL is behind authentication or otherwise not for public consumption, it does not belong in llms.txt.
- Stale entries. Any URL listed in llms.txt should resolve. Stale entries undermine the file's purpose.
- Putting the file at the wrong location. llms.txt belongs at the host root. Subdirectory placement is not the convention.
- Verbose entries. The convention is bulleted links with optional one-line context. Long descriptions per entry defeat the at-a-glance purpose.
- Inconsistent host scheme. If your canonical scheme is HTTPS, all URLs in llms.txt should be HTTPS.
- Missing a content owner. Without an editorial owner, llms.txt drifts. Assign someone to refresh it on a documented cadence.
9. Validation workflow
There is no formal validator because there is no formal specification. The practical validation is three layers:
Step 1 — Markdown parse
Any Markdown parser confirms the file's syntactic validity. The file should render cleanly on a Markdown viewer.
Step 2 — Link resolution
Every URL in llms.txt should return HTTP 200 (or 301/302 to a 200). Stale or broken URLs undermine the file's editorial purpose.
# Extract and HEAD-check every URL in llms.txt
curl -s https://example.com/llms.txt \
| grep -oE 'https?://[^[:space:]]+' \
| sort -u \
| while read url; do
code=$(curl -sI --max-time 8 -L "$url" | grep -E '^HTTP' | tail -1 | awk '{print $2}')
printf "%-4s %s\n" "$code" "$url"
done
Step 3 — Editorial review
The file's actual quality is whether the listed URLs are still the most important surfaces of the site. This is judgment, not validation. Schedule a review at a documented cadence.
10. Checklist
- llms.txt exists at the host root (https://example.com/llms.txt).
- The file returns HTTP 200.
- Content-Type is text/plain or text/markdown.
- The file uses UTF-8 encoding.
- The H1 names the site or organization.
- An overview block (blockquote or paragraph) follows the H1.
- Site URL is declared.
- Contact email or URL is declared.
- Topical H2 sections organize the content (Products, Docs, Tools, Articles, etc.).
- URLs in the file resolve to HTTP 200.
- The file lists top-level surfaces, not every URL on the site.
- HTTPS is used consistently across all listed URLs.
- No authentication-walled or non-public URLs are listed.
- The file is reviewed on a documented cadence (quarterly is a reasonable default).
- An editorial owner is named in your internal documentation.
- The file is version-controlled with the rest of the site.
11. FAQ
What is llms.txt?
llms.txt is a community-proposed convention published at llmstxt.org for a human-readable Markdown file at the root of a website that lists the site's most important content for language-model consumers. It is not an IETF or W3C standard. It is not a documented signal that any major AI provider claims to consume.
Does Google, OpenAI, Anthropic, or Perplexity consume llms.txt?
Not as a documented input. As of the date this page was written, none of those providers have published a statement committing to consume llms.txt as a ranking, retrieval, or training signal. Some LLM-based tools and agents do parse it as an inventory file. Treat llms.txt as a low-cost good-citizen artifact, not as a ranking input.
Is llms.txt a replacement for sitemap.xml?
No. sitemap.xml is the mandatory machine-readable inventory of crawlable URLs published by sitemaps.org and consumed by every major search engine. llms.txt is an optional Markdown file consumed by some LLM tools and humans. They serve different audiences and do not conflict.
Is llms.txt a replacement for robots.txt?
No. robots.txt is the RFC 9309 protocol for crawler access control. llms.txt is a content inventory. robots.txt gates access; llms.txt advertises content. The two serve different purposes and do not overlap.
Where should llms.txt live?
At the site root: https://example.com/llms.txt. The convention follows the robots.txt placement convention so that LLM tools looking for an inventory know where to fetch it.
What is llms-full.txt?
An optional extension of the convention proposed by llmstxt.org for a longer file that includes more detailed content excerpts. Most sites use only llms.txt; llms-full.txt is appropriate for sites with substantial reference content that benefits from inline summaries.
Can llms.txt be invalidated?
There is no formal validation because there is no formal specification. The convention is Markdown, so any Markdown parser confirms syntactic validity. Semantic validity (the file accurately lists current site content) is the operator's responsibility.
Does llms.txt change Google search ranking?
Google has not published a statement that llms.txt is a ranking signal. There is no documented mechanism by which an llms.txt file would influence Google Search rankings. Add it for the reasons it makes sense — inventory clarity, LLM-tool friendliness — not because it ranks.
12. Sources
- llmstxt.org — the convention — captured 2026-06
- RFC 9309 — Robots Exclusion Protocol — captured 2026-06
- sitemaps.org — Sitemap protocol — captured 2026-06
- Google Search Central — Sitemap overview — captured 2026-06
- Google Search Central — Robots intro — captured 2026-06
- OpenAI — Bots overview — captured 2026-06
- Anthropic — Crawler documentation — captured 2026-06