Test and validate any robots.txt file
This free robots.txt tester and validator lets you check the syntax of any robots.txt file, simulate how search engine crawlers will read it, and verify whether specific URLs are allowed or blocked. Paste your file directly, or fetch it from a live URL, then run rules against Googlebot, Bingbot, and other user-agents to find conflicts before they cause indexing problems.
A small mistake in robots.txt can quietly remove pages from Google for weeks. Testing the file before every deploy is one of the cheapest SEO insurance policies available.
How does a robots.txt tester work?
A robots.txt tester (sometimes called a robots.txt checker or robots.txt validator) parses the directives in your file and runs them against URLs you provide. For each URL, the tool reports whether the file allows or disallows that path for the user-agent you selected, and which specific rule produced the result. A good tester will also flag syntax errors, unknown directives, and rules that conflict with each other.
The tester above uses the standard Robots Exclusion Protocol (RFC 9309), which is the same specification Googlebot, Bingbot, and most legitimate crawlers follow.
How do I validate my robots.txt syntax?
To validate robots.txt syntax, the tool checks three things at once:
- Structural validity: every Disallow or Allow line belongs to a User-agent group, the file uses LF or CRLF line breaks, and there are no stray characters.
- Directive correctness: User-agent, Disallow, Allow, Sitemap, and Crawl-delay are recognized; misspellings (Dissallow, User-Agent without colon) are flagged.
- Rule logic: overlapping Allow and Disallow rules are surfaced so you can see which rule will win for a given URL.
Run the validator above on any pasted content. The result will list every error and warning with the exact line number.
How do I test if Googlebot is blocked from a page?
Select Googlebot from the user-agent picker, enter the page URL, and run the test. The tool returns one of three results:
- Allowed: Googlebot can crawl the URL.
- Blocked: Googlebot will not crawl the URL, and the tool shows the rule that blocks it.
- Allowed by exception: a broader Disallow rule applies, but a more specific Allow rule overrides it for this URL.
Test multiple URLs at once when validating a new robots.txt file before rollout, especially for category pages, product detail pages, faceted search URLs, and any path you have changed recently.
Robots.txt examples and directive syntax
Below are common robots.txt patterns. Paste any of them into the tester above to see exactly how each rule behaves.
Block all crawlers from the entire site
User-agent: *
Disallow: /
Used on staging environments. Never deploy this to production.
Allow all crawlers to access everything
User-agent: *
Disallow:
An empty Disallow line, or no Disallow line at all, means full access.
Block a specific subdirectory
User-agent: *
Disallow: /admin/
Disallow: /tmp/
Disallow: /search?
Useful for admin panels, internal search results, and temporary directories.
Allow Googlebot but block Bingbot
User-agent: Googlebot
Disallow:
User-agent: Bingbot
Disallow: /
Each User-agent group is independent. Bingbot will follow only its own block, not the rules under Googlebot.
Block AI crawlers while allowing search engines
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: *
Disallow:
This pattern keeps your content out of AI training datasets while remaining fully crawlable for organic search.
Sitemap directive
User-agent: *
Disallow: /admin/
Sitemap: https://www.example.com/sitemap.xml
Sitemap: https://www.example.com/sitemap-news.xml
The Sitemap directive is independent of any User-agent group and helps every search engine discover your sitemap files faster.
Common robots.txt mistakes
These mistakes account for most robots.txt incidents in the wild.
- Disallow: / on production: usually copied over from a staging robots.txt and never reverted. Fix immediately.
- Blocking render-critical resources: disallowing /css/, /js/, or /assets/ prevents Google from rendering the page properly and can suppress mobile rankings.
- Using noindex inside robots.txt: Google stopped supporting the noindex directive in robots.txt in 2019. Use the meta robots tag or X-Robots-Tag HTTP header instead.
- Case-sensitive paths: robots.txt paths are case-sensitive. /Admin/ does not match /admin/.
- Trailing-slash mismatches: Disallow: /category blocks /category and /category/page, but Disallow: /category/ only blocks paths under /category/.
- Forgetting that robots.txt is per host: example.com/robots.txt does not apply to blog.example.com or https://example.com if the canonical is HTTP.
- Relying on Disallow to keep URLs out of Google: a blocked URL can still be indexed (without content) if external links point to it. Use noindex on a crawlable page instead.
Robots.txt vs meta robots vs X-Robots-Tag
These three mechanisms control different things and are commonly confused.
| Mechanism | Where it lives | Controls | Best for |
|---|---|---|---|
| Robots.txt | /robots.txt at the site root | Crawling (whether a bot fetches the URL) | Blocking low-value paths, internal search, admin |
| Meta robots | HTML head of each page | Indexing, snippet behavior, link following | Keeping individual pages out of the index |
| X-Robots-Tag | HTTP response header | Indexing for any resource, including PDFs and images | Non-HTML files, large-scale index control |
If you need a page out of the index, use noindex (meta or X-Robots-Tag) on a page that is allowed in robots.txt. If you need to save crawl budget, use Disallow in robots.txt.
How robots.txt affects crawl budget
Crawl budget matters for sites with hundreds of thousands of URLs. Googlebot allocates a finite number of requests per host per day. If those requests are spent on faceted URLs, calendar widgets, and infinite filter combinations, important pages stay uncrawled.
Robots.txt is the most direct lever for shaping crawl budget. By disallowing entire directories of low-value URLs, you redirect Googlebot to the content that should actually rank. Combine this with a clean XML sitemap, hreflang where relevant, and consistent canonical tags for the strongest signal.
For a deeper dive into crawl-control techniques alongside .htaccess and Nginx, see the SEO guide to robots, .htaccess, and Nginx. For HTTP-level controls, see the guide to HTTP status codes for SEO.
How to upload a robots.txt file to your site
Once the file passes validation above, deploy it to the exact path /robots.txt on the root host. The steps differ by platform:
- Static sites (Next.js, Astro, Hugo, etc.): drop robots.txt in the public or static directory; it will be served at /robots.txt at build time.
- WordPress: most SEO plugins (Yoast, Rank Math, All in One SEO) include a robots.txt editor under their tools menu.
- Shopify, Webflow, Squarespace: each platform exposes a robots.txt editor in the admin; consult platform documentation.
- Custom servers: place the file in the document root and verify it is served with Content-Type: text/plain.
After uploading, fetch the file from a private browser window to confirm it is publicly accessible, then re-run the validator above against the deployed URL.
Why test robots.txt before every deploy
A misconfigured robots.txt is invisible to most monitoring. Pages stop being crawled, then drop from the index over days or weeks, then rankings decay, and only then does someone notice. Running the file through this validator before every release catches the entire class of errors in seconds.
Frequently Asked Questions
What is a robots.txt file and how does it work?
A robots.txt file is a plain text file placed at the root of your website that tells search engine crawlers which URLs they can or cannot request from your site. It follows the Robots Exclusion Protocol (RFC 9309) and uses User-agent, Disallow, Allow, and Sitemap directives to control crawler behavior.
How do I test my robots.txt file for errors?
Paste your robots.txt content into the tester above, or load it from a URL. The tool will validate the syntax, flag misplaced directives, identify conflicts between Allow and Disallow rules, and let you simulate how Googlebot, Bingbot, and other crawlers will interpret each rule.
How do I know if a URL is blocked by robots.txt?
Enter the URL into the tester, choose a user-agent like Googlebot, and the validator will tell you whether the URL is allowed or disallowed and which specific rule produced that result.
How do I validate robots.txt syntax?
A robots.txt validator parses your file line by line and reports syntax errors, unknown directives, malformed wildcards, and rules placed outside a User-agent group. Run the file through the tester above to get a syntax report alongside per-URL allow or block results.
What are the most common robots.txt mistakes that hurt SEO?
The most damaging mistakes are: blocking the entire site with 'Disallow: /' on production, blocking CSS or JavaScript files that Google needs to render the page, using a comma instead of a newline between paths, placing Disallow before any User-agent, and relying on robots.txt to keep pages out of the search index (it only blocks crawling, not indexing).
Does robots.txt affect Google rankings?
Robots.txt does not directly impact rankings, but it controls crawling, which controls what Google can index and rank. Blocking important pages or render-critical resources will harm visibility indirectly. Allowing Google to crawl unimportant pages can also waste crawl budget on large sites.
Can I use robots.txt to prevent pages from appearing in Google?
No. Robots.txt blocks crawling, but blocked pages can still appear in search results if other sites link to them. To remove pages from the index, use a noindex meta tag, an X-Robots-Tag HTTP header, password protection, or remove the page entirely. The page must remain crawlable for Google to see the noindex directive.
What is the difference between Disallow and Allow in robots.txt?
Disallow blocks crawlers from a path. Allow explicitly permits a path, typically used as an exception inside an otherwise disallowed directory. When rules conflict, most crawlers (including Googlebot) apply the more specific rule, where specificity is measured by the length of the matched path.
What is the difference between Disallow and noindex?
Disallow (in robots.txt) prevents crawling. Noindex (a meta tag or HTTP header) prevents indexing. A noindex tag only works if the crawler is allowed to fetch the page, so combining Disallow with noindex is contradictory and often results in pages staying in the index.
Which bots can I test with this robots.txt validator?
The tester supports the major search engine crawlers including Googlebot, Bingbot, and Bing-specific variants. You can also enter a custom user-agent string to simulate any other crawler, which is useful for testing rules targeting specific bots.
Can robots.txt block AI crawlers like GPTBot or ClaudeBot?
Yes. AI crawlers from OpenAI (GPTBot), Anthropic (ClaudeBot, Claude-Web), Google (Google-Extended), Perplexity (PerplexityBot), Apple (Applebot-Extended), and others all respect robots.txt. Add a User-agent block for each AI bot and use Disallow rules to prevent them from crawling your content for model training.
Where should the robots.txt file be placed?
Robots.txt must live at the root of the host, at the exact path /robots.txt (for example, https://yourdomain.com/robots.txt). Crawlers will not look anywhere else. The file applies only to the host and protocol where it is published, so each subdomain needs its own robots.txt.
What does 'User-agent: *' mean?
The asterisk is a wildcard that matches every crawler that does not have a more specific User-agent group elsewhere in the file. Rules under User-agent: * apply to all bots by default, but if a bot has a dedicated block (for example, User-agent: Googlebot), it will follow only that block and ignore the wildcard group.
How do I optimize my robots.txt for better SEO?
Keep the file short and explicit. Allow Googlebot to access CSS, JavaScript, and image resources required to render pages. Use Disallow only for genuinely low-value paths (admin, internal search, faceted URLs). Reference your XML sitemap with the Sitemap directive. Test the file in this tool before deploying changes to production.
Can this tool simulate Googlebot behavior?
Yes. Select Googlebot (or Googlebot-Image, Googlebot-News, etc.) from the user-agent picker, then enter the URLs you want to test. The validator returns allow or block status for each URL and shows which rule matched, mirroring how Googlebot would interpret your file.
Is this robots.txt tester free to use?
Yes. The tool is completely free, requires no account, and has no daily limits. You can test, validate, edit, and download your robots.txt as many times as you need.