jdhildeb
jdhildeb

Reputation: 3829

How to block a specific user agent in Apache

I'm configuring my Django app to email me errors (exceptions).

Normally no problem - but my email is hosted on Office 365, and it seems that Microsoft is automatically scanning and loading URLs within emails.

The result is that it hits the URL in my Django app, and causes another error... and another email. End result: a charming little mail loop which sends me 50+ messages within a few seconds.

I found entries like this in my apache logs:

157.55.39.163 - - [22/Aug/2018:17:30:05 +0000] "GET /testerror HTTP/1.1" 500 5808 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534+ (KHTML, like Gecko) BingPreview/1.0b"

I want to block access to the user agent (containing "BingPreview"), so I can prevent this loop.

I put this into my virtualhost:

SetEnvIf User-Agent "^.*BingPreview.*$" bad_user

<Directory /path/top/my/app/>
   <Files wsgi.py>
       Require not env bad_user
   </Files>
</Directory>

But when I reload apache, I get the error negative Require directive has no effect in <RequireAny> directive.

Upvotes: 16

Views: 24747

Answers (3)

Mikhail
Mikhail

Reputation: 11

$user_agents = [
    'python',
    'SemrushBot',
    'MJ12bot',
    'BLEXBot',
    'gulperbot',
    'scanner',
    'Jigsaw',
    'PhantomJS',
    'checker',
    'netcraft',
    'bingbot',
    'Dalvik',
    'AhrefsBot',
    'Bytespider'
];

foreach ($user_agents as $agent) {
    if (strpos($_SERVER['HTTP_USER_AGENT'], $agent) !== false) {
        header('HTTP/1.1 404 Not Found');
        exit;
    }
}

place in index file

Upvotes: 1

jdhildeb
jdhildeb

Reputation: 3829

Got it figured out. Thanks for the tip, @Tobias K.

I enabled mod_rewrite because it wasn't already enabled.

a2enmod rewrite

Then I put this into my virtual host:

RewriteEngine on
RewriteCond %{HTTP_USER_AGENT}  ^.*BingPreview.*$
RewriteRule . - [R=403,L]

RewriteCond %{HTTP_USER_AGENT} ^.*BingPreview.*$ line sets a condition for the rewrite rule that follows. It checks the HTTP_USER_AGENT header of the incoming request to see if it matches the pattern ^.*BingPreview.*$. This pattern means any user agent string containing BingPreview (the ^ and $ indicate the start and end of the string, and .* means any number of any characters).

And restarted apache to take effect:

service apache2 restart

And I can see in the apache log that BingPreview is getting blocked (note the 403):

157.55.39.163 - - [22/Aug/2018:18:12:09 +0000] "GET /testerror HTTP/1.1" 403 4385 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534+ (KHTML, like Gecko) BingPreview/1.0b"

Upvotes: 20

MrWhite
MrWhite

Reputation: 46012

But when I reload apache, I get the error negative Require directive has no effect in <RequireAny> directive

<RequireAny> is the default "implied" container if not explicitly stated.

You can resolve this by doing something like the following instead:

SetEnvIf User-Agent "BingPreview" bad_user

<Directory /path/top/my/app/>
   <Files wsgi.py>
       <RequireAll>
       Require all granted
       Require not env bad_user
       </RequireAll>
   </Files>
</Directory>

The "regex" BingReview is the same as ^.*BingPreview.*$.

Upvotes: 6

Related Questions