fips123
fips123

Reputation: 281

robots.txt: Does Wildcard mean no characters too?


I have the following example robots.txt and questions about the wildcard:

User-agent: *

Disallow: /*/admin/*

Does this rule now apply on both pages:

http://www.example.org/admin and http://www.example.org/es/admin

So can the Wildcard stand for no characters?

Upvotes: 1

Views: 422

Answers (1)

unor
unor

Reputation: 96697

In the original robots.txt specification, * in Disallow values has no special meaning, it’s just a character like any other. So, bots following the original spec would crawl http://www.example.org/admin as well as http://www.example.org/es/admin.

Some bots support "extensions" of the original robots.txt spec, and a popular extension is interpreting * in Disallow values to be a wildcard. However, these extensions aren’t standardized somewhere, each bot could interpret it differently.

The most popular definition is arguably the one from Google Search (Google says that Bing, Yahoo, and Ask use the same definition):

* designates 0 or more instances of any valid character

Your example

When interpreting the * according to the above definition, both of your URLs would still allowed to be crawled, though.

Your /*/admin/* requires three slashes in the path, but http://www.example.org/admin has only one, and http://www.example.org/es/admin has only two.

(Also note that the empty line between the User-agent and the Disallow lines is not allowed.)

You might want to use this:

User-agent: *
Disallow: /admin
Disallow: /*/admin

This would block at least the same, but possibly more than you want to block (depends on your URLs):

User-agent: *
Disallow: /*admin

Keep in mind that bots who follow the original robots.txt spec would ignore it, as they interpret * literally. If you want to cover both kinds of bots, you would have to add multiple records: a record with User-agent: * for the bots that follow the original spec, and a record listing all user agents (in User-agent) that support the wildcard.

Upvotes: 1

Related Questions