Ashkan Kamyab
Ashkan Kamyab

Reputation: 186

Detecting fake-user-agent(aka scraper) requesting GET my webserver?

I have a Nginx server to serve two websites. My question is if there is a way to detect and block all Get requests which are modified their user-agent on the header! Then I can be sure that no one can scrape my posts. I am keen to use Net-Filter in this Approach but I am not sure if is it this much powerful!

Upvotes: 1

Views: 2052

Answers (1)

HumaneWolf
HumaneWolf

Reputation: 99

Simple answer: no.

You can look at user agent headers, which depending on the scraper might reveal it and make it obvious, however, nothing stops me (or anyone else) from making a user agent that is identical to a normal browser. The client, whether that is a browser or a script written by a programmer.

You could try a whitelist, blocking everything not on it, but then you would quickly end up blocking any non-mainstream browser, not to mention you could also easily start blocking new versions of mainstream browsers. It would need constant updating and maintenance, and still be very easy to circumvent.

A blacklist simply would not work, as you can't predict what kind of user agent a developer can "tell" the scraper to use.

Now, in theory, you could analyze user behaviour and make decisions based on that. However, this would require a decent amount of work, and can very easily start being a nuisance to legitimate traffic, while it is likely to not work against a decent scraper.

Upvotes: 2

Related Questions