Sitemap/robots.txt configuration conflict

Question

My robots.txt contains the following rules:

Disallow: /api/
Allow: /
Allow: /apiDocs

The /apiDocs URL is in the sitemap, but according to Google Webmaster Tools, these robots.txt rules prohibit it from being crawled. I want to prevent all URLs that match /api/* from being crawled, but allow the URL /apiDocs to be crawled.

How should I change my robots.txt to achieve this?

unor · Accepted Answer

Line breaks aren’t allowed in a record (you have one between your Disallow and the two Allow lines).
You don’t need Allow: / (it’s the same as Disallow:, which is the default).
You disallow crawling of /api/ (which is any URL whose path starts with "api" followed by a "/"), so there is no need for Allow: /apiDocs as it’s allowed anyway.

So your fallback record should look like:

User-Agent: *
Disallow: /login/
Disallow: /logout/
Disallow: /admin/
Disallow: /error/
Disallow: /festival/subscriptions
Disallow: /artistSubscription
Disallow: /privacy
Disallow: /terms
Disallow: /static
Disallow: /api/

When a bot is matched by this "fallback" record, it is allowed to crawl URLs whose paths start with apiDocs.

Sitemap/robots.txt configuration conflict

Answers (1)

Related Questions