Reputation: 2174
I have some problem with empty user user agent in my error logs most of which are happening on the 404 page where I am tracking all errors for further rectification or exclusion.
I have found few solutions that I can implement to solve the problem with this error so this is not too much of a problem, but I guess my real question is, as most of this errors have empty HTTP_USER_AGENT
it looks to me like it is not a real user, but instead a robot that is trying my system for any loopholes. What I want to do is to create some sort of a trap for this, but I am not sure about one thing only, which is friendly bots like googlebot or yaoo slurp or similar that I preffer to keep unblocked from crawling my site. Do these friendly bots have any HTTP_USER_AGENT
that I can ID it by, so that I am not blocking it accidentally? and my second question what is the right way to go about it? any code or pointers will help.
Thanks in advance and forgive me if my question is not entirely about getting stack and looking for solution in the code creation process. I am just tired of all this recent SPAM activity on my site and have nowhere to turn to search for reliable solutions except here.
I have to edit the question to make it clear.
Is it safe to just issue 403 if HTTP_USER_AGENT is empty?
One of the examples is the following to the page that never existed on my server STATUS 301, COUNTRY China, USER AGENT Bittorrent, then the same IP with blank user agent again.
GET /announce?info_hash=%8E%D0%80%01%B7K7%DBb%CF%83%82%B3%93%8E%A0wi%90%D4&peer_id=%2DSD0100%2D%09B%12%19%5FYi%2B%0C%00%C9Q&ip=192.168.1.101&port=14706&uploaded=880755775&downloaded=880755775&left=1101004800&numwant=200&key=26441&compact=1 HTTP/1.0
Upvotes: 0
Views: 1577
Reputation: 48101
Yes, most bot (google/yahoo) set their user-agent, but you should never rely on them.
For istance, googlebot could visit your website with a standard browser user-agent (such as: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36
) to check if there are any differences.
This is to prevent webmaster to optimize the website just for googlebot and provide a different page to the users.
The best option if you see too much traffic from a certain bot is to block its address.
Upvotes: 1