Reputation: 700
I have a website project that counts views of certain pages. I store these views as ip address and date, so I know when a user clicks several times on that page, it counts only 1 time.
The thing is I also would want to eliminate search bots from counting as real users, when they access my website.
I'm usually doing this on other sites by converting the IP to real address, and comparing with strings as 'google' etc
But that takes sometimes 3-4 seconds to convert into real address which makes my website slower.
How can I tackle this problem ?
UPDATE
I researched som more and I came up with this simple solution (based on your answers)
$global_bots = array("bot", "slurp", "spider", "crawl", "archiver", "facebook");
// this variable above goes in my global settings file which is included in all my files
$user_agent = $_SERVER['HTTP_USER_AGENT'];
$bot_count = 0;
do{
$pos = stripos($user_agent, $global_bots[$bot_count]);
$bot_count++;
}while($pos===false && ( $bot_count < count($global_bots) ) );
if($pos!==false){
//check if user already clicked today on this page, else record his ip
}else{
//don't record it, it's mostly a BOT
}
If you have any other updates to this, such as strings found in bots, feel free to jump in with quick answers.
Thanks.
Upvotes: 0
Views: 85
Reputation: 146490
It isn't a "conversion" (in the sense of maths being involved): it's a lookup against an external database (the DNS server). You should use the same rules as in any other external service lookup:
Once you know this, my humble advice is that you don't do it:
The usual (non exclusive) approaches to distinguish bots are:
You can probably borrow a user agent database (or even a user agent detection library).
Upvotes: 2
Reputation: 6660
Well-behaving bots will first request the /robots.txt path, while humans won’t usually request it at all. So you can identify bots by looking for User-Agents that first request this path.
Upvotes: 0