HTTP_USER_AGENT not set - is it normal? or probably a bot?

Question

I'm asking for your oppinion / experiences about this.

Our CMS is fetching info from the HTTP_USER_AGENT string. Recently we have discovered a bug in the code - forgot to check if HTTP_USER_AGENT is present (which is possible, but honestly: we simply skipped that, didn't expected that to happen) or not - these cases resulted in an error. So we have corrected it, and installed a tracking there: if HTTP_USER_AGENT is not set an alert is sent to our tracking system.

Now we have data/statistics from many websites from the past months. Now our stats show this is really rare. ~ 0.05-0.1%

Another interesting observation: these requests are single. Didn't find any case where this "user" has multiple pageviews in the same session...

This forced us thinking... Should we treat these requests as robots? And simply block them out... Or that would be a serious mistake?
Googlebot and other "good robots" are always sending HTTP_USER_AGENT info.

I know it is possible that firewalls or proxy servers MAY alter (or remove) this user-agent info. But according to our stats I can not clarify this...

What are your experiences? Is here anyone else who made any research about this topic?

Other posts I found on stackoverflow are simply accepting the fact "it is possible this info is not sent". But why don't we question that for a moment? Is it really normal??

MrCode · Accepted Answer

I would consider the lack of user-agent abnormal for genuine users, however it is still a [rare] possibility which may be caused by a firewall, proxy or privacy software stripping the user-agent.

A request missing a user-agent is most likely a bot or script (not necessarily a search engine crawler). Although you can't say for sure of course.

Other factors that may indicate a bot/script:

Only requesting the page itself, the failure to request resources on the page such as images, CSS and Javascript
A very short space of time between requests from page-page (such as within the same second).
The failure to send cookies or session IDs on subsequent requests where a cookie should have been set, but keep in mind genuine users may have cookies disabled.

HTTP_USER_AGENT not set - is it normal? or probably a bot?

Answers (2)

Related Questions