karmapolice
karmapolice

Reputation: 336

robots.txt: how are ill-formed disallow lines treated

What happens when a Disallow line includes more than one URI? Example:

Disallow: / tmp/

I white space was introduced by mistake.

Is there a standard way in how web browsers deal with this? Do they ignore the whole line or just ignore the second URI and treat it like:

Disallow: /

Upvotes: 0

Views: 60

Answers (1)

plasticinsect
plasticinsect

Reputation: 1752

Google, at least, seems to treat the first non-space character as the beginning of the path, and the last non-space character as the end. Anything in-between is counted as part of the path, even if it's a space. Google also silently percent-encodes certain characters in the path, including spaces.

So the following:

Disallow: / tmp/

will block:

http://example.com/%20tmp/

but it will not block:

http://example.com/tmp/

I have verified this on Google's robots.txt tester. YMMV for crawlers other than Google.

Upvotes: 1

Related Questions