user2719565
user2719565

Reputation: 45

Block access in robot.txt

If I want to block access to all .py files in my server but allow all files in the same folder as this .py files, what should I put in my robot.txt? This is what I have right, now:

User-Agent: *    
Disallow: /*_.py

Upvotes: 0

Views: 710

Answers (2)

Adrian Wragg
Adrian Wragg

Reputation: 7401

There is no "defined standard" for robots.txt files, but there is a lot of information aggregated on http://www.robotstxt.org/.

On http://www.robotstxt.org/robotstxt.html, it is stated:

Specifically, you cannot have lines like "User-agent: *bot*", "Disallow: /tmp/*" or "Disallow: *.gif".

Although some crawlers do allow wildcards, if you want to have a reliable solution your safest bet would be to assume that no crawler would take your Disallow: line into account, and suggest that you find an alternative solution. Otherwise, you will have implemented a solution that will have support from some search engines, whilst leaving your site wide open to others.

Upvotes: 2

ajiang
ajiang

Reputation: 1722

According to this page and this one, specific crawlers/robots (e.g. Googlebot and MSNBot) do support the use of the asterisk (*) in the "Disallow:" line.

For example, if you want to block Googlebot from your .py files, you'd use:

User-agent: Googlebot
Disallow: /*.py$

The dollar sign ($) designates the end of the filename (including its extension). Note that Googlebot-Image and MSNBot also follow this syntax. However, since I wasn't able to find information about support for this feature for other crawlers, you might want to specify the syntax for the "User-agent"s mentioned in this post.

Of course, it'd be better in the long run to find a universal solution, but this might be a quick fix.

Upvotes: 1

Related Questions