Adrian Tanase
Adrian Tanase

Reputation: 700

converting IP to ADDRESS takes too much time

I have a website project that counts views of certain pages. I store these views as ip address and date, so I know when a user clicks several times on that page, it counts only 1 time.

The thing is I also would want to eliminate search bots from counting as real users, when they access my website.

I'm usually doing this on other sites by converting the IP to real address, and comparing with strings as 'google' etc

But that takes sometimes 3-4 seconds to convert into real address which makes my website slower.

How can I tackle this problem ?

UPDATE

I researched som more and I came up with this simple solution (based on your answers)

$global_bots = array("bot", "slurp", "spider", "crawl", "archiver", "facebook");

// this variable above goes in my global settings file which is included in all my files

$user_agent = $_SERVER['HTTP_USER_AGENT'];

$bot_count = 0;
do{
    $pos = stripos($user_agent, $global_bots[$bot_count]);
    $bot_count++;
}while($pos===false && ( $bot_count < count($global_bots) ) );

if($pos!==false){
   //check if user already clicked today on this page, else record his ip
}else{
   //don't record it, it's mostly a BOT
}

If you have any other updates to this, such as strings found in bots, feel free to jump in with quick answers.

Thanks.

Upvotes: 0

Views: 85

Answers (2)

&#193;lvaro Gonz&#225;lez
&#193;lvaro Gonz&#225;lez

Reputation: 146490

It isn't a "conversion" (in the sense of maths being involved): it's a lookup against an external database (the DNS server). You should use the same rules as in any other external service lookup:

  • Store the results you get so you don't have to query again.
  • Delay the task, possibly to a command-line cron job (visitor stats do not normally need to be processed real time).

Once you know this, my humble advice is that you don't do it:

  • How do you plan to compile and maintain a decent database? There must be a million of crawlers out there.
  • You assume that crawlers always run on dedicated servers with a public IP address, which is not true.

The usual (non exclusive) approaches to distinguish bots are:

  • User agent string
  • Ability to run JavaScript

You can probably borrow a user agent database (or even a user agent detection library).

Upvotes: 2

&#201;tienne Miret
&#201;tienne Miret

Reputation: 6660

Well-behaving bots will first request the /robots.txt path, while humans won’t usually request it at all. So you can identify bots by looking for User-Agents that first request this path.

Upvotes: 0

Related Questions