systempuntoout
systempuntoout

Reputation: 74094

How to disallow access to an url called without parameters with robots.txt

I would like to deny web robots to access a url like this:

http://www.example.com/export

allowing this kind of url instead:

http://www.example.com/export?foo=value1

A spider bot is calling /export without query string causing a lot of errors on my log.
Is there a way to manage this filter on robots.txt?

Upvotes: 0

Views: 222

Answers (1)

Pekka
Pekka

Reputation: 449415

I am assuming you have problems with bots hitting the first URL in your example.

As said in the comment, this is probably not possible, because http://www.example.com/export is the resource's base URL. Even if it were possible as per the standard, I wouldn't trust bots to understand this properly.

I would also not send a 401 Access denied or similar header if the URL is called without a query string for the same reason: A bot could think that the resource is out of bounds entirely.

What I would do in your situation is, if somebody arrives at

 http://www.example.com/export

send a 301 Moved permanently redirect to the same URL and a query string with some default values, like

 http://www.example.com/export?foo=0

this should keep the search engine index clean. (It won't fix the logging problem you state in your comment, though.)

Upvotes: 1

Related Questions