Reputation: 117
I'm having trouble blocking two bad bots that keep sucking bandwidth from my site and I'm certain it has something to do with the * in the user-agent name that they use.
Right now, I'm using the following code to block the bad bots (this is an excerpt)...
# block bad bots
RewriteCond %{HTTP_USER_AGENT} ^$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^spider$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^robot$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^crawl$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^discovery$
RewriteRule .* - [F,L]
When I try to do RewriteCond %{HTTP_USER_AGENT} ^*bot$ [OR]
or RewriteCond %{HTTP_USER_AGENT} ^(*bot)$ [OR]
I get an error.
Guessing there is a pretty easy way to do this that I just haven't found yet on Google.
Upvotes: 1
Views: 3636
Reputation: 8806
An asterisk (*) in a regular expression pattern needs to be escaped, since it is being interpreted as part of the regular expression.
RewriteCond %{HTTP_USER_AGENT} ^\*bot$
should do the trick.
Upvotes: 1
Reputation: 1146
But how is this going to prevent Bad Bot access?
I work for a security company (also PM at Botopedia.org) and I can tell that 99.9% of bad bots will not use any of these expressions in their user-agent string.
Most of the time Bad Bots will use legitimate looking user-agents (impersonating browsers and VIP bots like Googlebot) and you simply cannot filter them via user-agent data alone.
For effective bot detection you should look into other signs like:
1) Suspicious signatures (i.e. Order of Header parameter)
or/and
2) Suspicious behavior (i.e. early robots.txt access or request rates/patterns)
Then you should use different challenges (i.e. JS or Cookie or even CAPTCHA) to verify your suspicions.
The problem you've described is often referred to as a "Parasitic Drag".
This is a very real and serious issue and we actually published a research about it just a couple of month ago.
(We found that on an average sized site 51% of visitors will be bots, 31% malicious)
Honestly, I don't think you can solve this problem with several line of RegEx.
We offer our Bot filtering services for free and there are several others like us. (I can endorse good services if needed)
GL.
Upvotes: 0
Reputation: 786146
I think your are missing a dot .
, change your condition to this:
RewriteCond %{HTTP_USER_AGENT} ^.*bot$ [OR]
Upvotes: 0