Reputation:
I have the following files on my server:
/file
/file.html
/file/bob.html
I want to exclude them all from being indexed. Is the following robots.txt enough?
User-Agent: *
Disallow: /file
Or even just:
User-Agent: *
Disallow: /f
Note:
I understand that Google's bots would accept /file
as disallowing them from all mentioned files (see https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt), but I don't want to address only Google but all well-behaved bots, so my question is about the original standard, not later extensions to it.
Upvotes: 0
Views: 399
Reputation: 1752
In short, yes.
If you have:
User-agent: *
Disallow: /abc
It will block anything that starts with /abc, including:
/abc
/abc.html
/abc/def/ghi
/abcdefghi
/abc?x=123
This is part of the original robots.txt standard, and it applies to all robots that obey robots.txt.
The thing to remember about robots.txt is that it's deliberately not very sophisticated. It was designed to be simple and easy for crawlers to implement. Unless you use an extension (like wildcards) it's a simple string comparison. The directive will match any URL that starts with the sequence of characters you give.
Upvotes: 1