hollo
hollo

Reputation: 23

small php code for detecting real search spiders from a spammer

hi I just want your opinions about this code I found on a website for detect real search spiders from spammer is it good?? and do you have any recommendations for other scripts or methods for this subject

<?php 
$ua = $_SERVER['HTTP_USER_AGENT'];

$spiders=array('msnbot','googlebot','yahoo');

$pattern=array("/\.google\.com$/","/search\.live\.com$/","/\.yahoo\.com$/");

for($i=0;$i < count($spiders) and $i < count($pattern);$i++)

{

  if(stristr($ua, $spiders[$i])){

    //it's pretending to be MSN's bot or Google's bot

    $ip = $_SERVER['REMOTE_ADDR'];

    $hostname = gethostbyaddr($ip);



    if(!preg_match($pattern[$i], $hostname))

    {

      //the hostname does not belong to either live.com or googlebot.com.

      //Remember the UA already said it is either MSNBot or Googlebot.

      //So it's a spammer.

      echo "spammer";

      exit;

    }

    else{

      //Now we have a hit that half-passes the check. One last go:

      $real_ip = gethostbyname($hostname);

      if($ip != $real_ip){

        //spammer!

        echo "Please leave Now spammr";

        break;

      }

      else{

        //real bot

      }

    }

  }

  else

  {

    echo "hello user";

  }

}

note: it used user agent switcher with this code and it worked perfectly but am not sure if it will work in real world, so what do you think??

Upvotes: 0

Views: 887

Answers (2)

Mary Daisy Sanchez
Mary Daisy Sanchez

Reputation: 1057

you can also have htaccess so that things like this will be prevented just like on this tutorial http://perishablepress.com/press/2007/06/28/ultimate-htaccess-blacklist/

Upvotes: 0

Pekka
Pekka

Reputation: 449385

What would keep a spammer from simply giving an entirely correct user agent string?

I think this is fairly pointless. You would have to at least compare IP ranges (or their name servers) as well in order to get reliable results. This is possible for Google:

Google Webmaster Central: How to verify Googlebot

but even if you test for Google and Bing this way, a spambot can enter your site simply by giving a browser user-agent. Therefore, it is ultimately impossible to detect a spam-bot. They are a reality, and there is no good way to keep them out from a web site.

Upvotes: 3

Related Questions