Reputation: 20279
I'm trying to block all bots/crawlers/spiders for a special directory. How can I do that with htaccess
? I searched a little bit and found a solution by blocking based on the user agent:
RewriteCond %{HTTP_USER_AGENT} googlebot
Now I would need more user agents (for all bots known) and the rule should be only valid for my separate directory. I have already a robots.txt but not all crawlers take a look at it ... Blocking by IP address is not an option. Or are there other solutions? I know the password protection but I have to ask first if this would be an option. Nevertheless, I look for a solution based on the user agent.
Upvotes: 11
Views: 82626
Reputation: 785058
Why use .htaccess or mod_rewrite for a job that is specifically meant for robots.txt
? Here is the robots.txt snippet you will need to block a specific set of directories for search crawlers:
User-agent: *
Disallow: /subdir1/
Disallow: /subdir2/
Disallow: /subdir3/
This will block all search bots in directories /subdir1/
, /subdir2/
and /subdir3/
.
For more explanation see here: http://www.robotstxt.org/orig.html
Upvotes: 12
Reputation: 232
I Know the topic is "old" but still, for ppl who landed here also (as I also did),
you could look here great 5g blacklist 2013 (08/2023 update: 7G firewall / 8G firewall beta).
It's a great help and NO not only for wordpress but also for all other sites. Works awesome imho.
Another one which is worth looking at could be Linux reviews anti spam through .htaccess (last functional archived link).
Upvotes: 8
Reputation: 165118
You need to have mod_rewrite enabled. Placed it in .htaccess in that folder. If placed elsewhere (e.g. parent folder) then RewriteRule pattern need to be slightly modified to include that folder name).
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (googlebot|bingbot|Baiduspider) [NC]
RewriteRule .* - [R=403,L]
Upvotes: 22