Reputation: 307
I want robots.txt to allow only index.php and images folder and disallow all other folders, is this possible?
This is my code:
User-agent: *
Allow: /index.php
Allow: /images
Disallow: /
Secondly, is it possible to do the same job with htaccess?
Upvotes: 2
Views: 1415
Reputation: 9227
First, be aware that the "Allow" option is actually a non-standard extension and is not supported by all crawlers. See the wiki page (in the "Nonstandard extensions" section) and the robotstxt.org page.
This is currently a bit awkward, as there is no "Allow" field. The easy way is to put all files to be disallowed into a separate directory, say "stuff", and leave the one file in the level above this directory:
Some major crawlers do support it, but frustratingly they handle it in different ways. For example. Google prioritises Allow statements by matching characters and path length, whereas Bing prefers you to just put the Allow statements first. The example you've given above will work in both cases, though.
Bear in mind those crawlers who do not support it will simply ignore it, and will therefore just see your "Disallow" rule, effectively stopping them from indexing your entire site! You have to decide if the extra work moving files around (or writing a long list of Disallow rules for all your subdirectories) is really worth the bonus of getting indexed by the lesser crawlers. Probably not.
Ref htaccess, you can't really do anything useful with it here. You'd have to match the user agent against a large list of known bots and you'd just end up missing some - or worse, blocking real users.
Upvotes: 2
Reputation: 441
Yes, that code is correct. The robots.txt
file is read from top to bottom so as long as the disallow
is on the bottom you won't run into problems. This is because it matches the first rule, if the disallow
was on the top then it wouldn't ever reach the allow
statements.
Edit/Sidenote:
This is only for "good" (Googlebot, Bingbot etc..) robots which follow the standard. Plenty of other robots either misinterpret the robots.txt
file or just completely ignore it.
Upvotes: -1