Reputation: 20496
What is the difference between the two robots.txt
files below?
User-agent: *
Allow: /
vs.
User-agent: *
Disallow:
On Wikipedia it lists the later as an example under the Examples
section.
However later it has code similar to the first code:
User-agent: bingbot
Allow : /
Crawl-delay: 10
Upvotes: 4
Views: 4268
Reputation: 25524
You should prefer to use the disallow syntax:
User-agent: *
Disallow:
Disallow
is part of the original robots.txt standard that is understood by every bot that obeys robots.txt.
Allow
is extension syntax introduced by Google and understood by a few bots only. It was added to be able to disallow everything but then re-allow a few things. It would be most appropriate to use it like:
User-agent: *
Disallow: /
Allow: /public
In that case, most bots wouldn't be able to crawl the site at all, but the few bots that understand Allow:
would be able to crawl the public directory.
When Disallow:
and Allow:
directives conflict (as in the above example), the longer one that applies to a given URL takes precedence. Eg /public/foo
would use the Allow: /public
rule because both rules could apply but that rule is longer. /private/foo
would use the Disallow: /
rule because only it matches. The order of rules makes no difference.
Upvotes: 6
Reputation: 1326
The first one tells all user-agents such as web-crawlers or google index bots that they are allowed to explore all the website since /
is the root path of the website, for example, http://example.org
would be /
, and https://example.org/admin
would be /admin
in your robots.txt
The Disallow
directive does exactly the opposite, it tells the user-agents to stay out of said paths.
Allow
and Disallow
can be used in different ways, such as a whitelist or a blacklist.
And because of that, the following
User-agent: *
Allow: /
is the same as
User-agent: *
Disallow:
The easiest way of understanding this is by thinking that Allow
and Disallow
are like "lists" of paths, but just one type of directive ( Allow
, or Disallow
) should be used.
For example, let's blacklist our robots.txt using the Disallow
directive, only denying the bing indexer to index our website.
User-agent: Bingbot
Disallow: /
User-agent: *
Disallow:
In short.
If disallow is empty, then everything is allowed.
And if you allow everything, then nothing is disallowed.
Upvotes: 1