Reputation: 186
I have a Nginx server to serve two websites. My question is if there is a way to detect and block all Get requests which are modified their user-agent on the header! Then I can be sure that no one can scrape my posts. I am keen to use Net-Filter in this Approach but I am not sure if is it this much powerful!
Upvotes: 1
Views: 2052
Reputation: 99
Simple answer: no.
You can look at user agent headers, which depending on the scraper might reveal it and make it obvious, however, nothing stops me (or anyone else) from making a user agent that is identical to a normal browser. The client, whether that is a browser or a script written by a programmer.
You could try a whitelist, blocking everything not on it, but then you would quickly end up blocking any non-mainstream browser, not to mention you could also easily start blocking new versions of mainstream browsers. It would need constant updating and maintenance, and still be very easy to circumvent.
A blacklist simply would not work, as you can't predict what kind of user agent a developer can "tell" the scraper to use.
Now, in theory, you could analyze user behaviour and make decisions based on that. However, this would require a decent amount of work, and can very easily start being a nuisance to legitimate traffic, while it is likely to not work against a decent scraper.
Upvotes: 2