Is the beginning of a path enough in robots.txt?

Question

I have the following files on my server:

/file
/file.html
/file/bob.html

I want to exclude them all from being indexed. Is the following robots.txt enough?

User-Agent: *
Disallow: /file

Or even just:

User-Agent: *
Disallow: /f

Note:

I understand that Google's bots would accept /file as disallowing them from all mentioned files (see https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt), but I don't want to address only Google but all well-behaved bots, so my question is about the original standard, not later extensions to it.

plasticinsect · Accepted Answer

In short, yes.

If you have:

User-agent: *
Disallow: /abc

It will block anything that starts with /abc, including:

/abc
/abc.html
/abc/def/ghi
/abcdefghi
/abc?x=123

This is part of the original robots.txt standard, and it applies to all robots that obey robots.txt.

The thing to remember about robots.txt is that it's deliberately not very sophisticated. It was designed to be simple and easy for crawlers to implement. Unless you use an extension (like wildcards) it's a simple string comparison. The directive will match any URL that starts with the sequence of characters you give.

Is the beginning of a path enough in robots.txt?

Answers (1)

Related Questions