Reputation: 281
I have the following example robots.txt and questions about the wildcard:
User-agent: *
Disallow: /*/admin/*
Does this rule now apply on both pages:
http://www.example.org/admin and http://www.example.org/es/admin
So can the Wildcard stand for no characters?
Upvotes: 1
Views: 422
Reputation: 96697
In the original robots.txt specification, *
in Disallow
values has no special meaning, it’s just a character like any other. So, bots following the original spec would crawl http://www.example.org/admin
as well as http://www.example.org/es/admin
.
Some bots support "extensions" of the original robots.txt spec, and a popular extension is interpreting *
in Disallow
values to be a wildcard. However, these extensions aren’t standardized somewhere, each bot could interpret it differently.
The most popular definition is arguably the one from Google Search (Google says that Bing, Yahoo, and Ask use the same definition):
*
designates 0 or more instances of any valid character
When interpreting the *
according to the above definition, both of your URLs would still allowed to be crawled, though.
Your /*/admin/*
requires three slashes in the path, but http://www.example.org/admin
has only one, and http://www.example.org/es/admin
has only two.
(Also note that the empty line between the User-agent
and the Disallow
lines is not allowed.)
You might want to use this:
User-agent: *
Disallow: /admin
Disallow: /*/admin
This would block at least the same, but possibly more than you want to block (depends on your URLs):
User-agent: *
Disallow: /*admin
Keep in mind that bots who follow the original robots.txt spec would ignore it, as they interpret *
literally. If you want to cover both kinds of bots, you would have to add multiple records: a record with User-agent: *
for the bots that follow the original spec, and a record listing all user agents (in User-agent
) that support the wildcard.
Upvotes: 1