Reputation: 45
If I want to block access to all .py files in my server but allow all files in the same folder as this .py files, what should I put in my robot.txt? This is what I have right, now:
User-Agent: *
Disallow: /*_.py
Upvotes: 0
Views: 710
Reputation: 7401
There is no "defined standard" for robots.txt
files, but there is a lot of information aggregated on http://www.robotstxt.org/.
On http://www.robotstxt.org/robotstxt.html, it is stated:
Specifically, you cannot have lines like "User-agent: *bot*", "Disallow: /tmp/*" or "Disallow: *.gif".
Although some crawlers do allow wildcards, if you want to have a reliable solution your safest bet would be to assume that no crawler would take your Disallow:
line into account, and suggest that you find an alternative solution. Otherwise, you will have implemented a solution that will have support from some search engines, whilst leaving your site wide open to others.
Upvotes: 2
Reputation: 1722
According to this page and this one, specific crawlers/robots (e.g. Googlebot and MSNBot) do support the use of the asterisk (*) in the "Disallow:" line.
For example, if you want to block Googlebot from your .py files, you'd use:
User-agent: Googlebot
Disallow: /*.py$
The dollar sign ($) designates the end of the filename (including its extension). Note that Googlebot-Image and MSNBot also follow this syntax. However, since I wasn't able to find information about support for this feature for other crawlers, you might want to specify the syntax for the "User-agent"s mentioned in this post.
Of course, it'd be better in the long run to find a universal solution, but this might be a quick fix.
Upvotes: 1