pokemon39
pokemon39

Reputation: 11

how can i get google search results pozition with php

how can i get google search results pozition with php?

Upvotes: 1

Views: 1233

Answers (1)

John
John

Reputation: 7836

The core of your question is: how many keywords and positions are you looking to extract from Googles resultsets ?

The suggested google search APIs are worthless if you want accurate positions. Also important is that the rank/position of a website is not only dependend of the keyword. It also depends at least: * safe-search and similar options * number of results in page (you need to stick to 10 results) * your location (the use of &hl parameter helps a lot on getting around this limitation) * quality of your IP/proxy (abusive history of an IP can change the resultset)

My first question: how many keywords are you looking to scrape from Google is essential. If you hit google with more than a few requests you'll receive captchas/graylistings and similar troubles. You can make about 500 requests per day (well spread) with one IP, so for larger scaled keyword-analysis you need proxies.

PHP is well suited to scraping the ranks of Google, you can use libCURL and the DOM parser to access the pages and process the raw html data. Get the source code of the PHP Google-rank-checker here: http://google-rank-checker.squabbel.com It contains all you need and is open source.

If you want to do all by yourself, here is a help for your start:

LibCURL for accessing Google. libCURL can manage cookies, supports proxies, supports timeouts and so on. It also supports HTTP headers, so you can use a user-agent string of your choice. You don't want to have "PHP script" or similar stuff in there, eh ?

Example code:

  $ch = curl_init();
  curl_setopt ($ch, CURLOPT_HEADER, 0);
  curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
  curl_setopt ($ch, CURLOPT_RETURNTRANSFER , 1);
  $curl_proxy = "$ip:$port"; 
  curl_setopt($ch, CURLOPT_PROXY, $curl_proxy);  // comment this to test without proxy   
  curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 20);               
  curl_setopt($ch, CURLOPT_TIMEOUT, 20);
  curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.0; en; rv:1.9.0.4) Gecko/2009011913 Firefox/3.0.6");
  $url="your google search q="
  curl_setopt ($ch, CURLOPT_URL, $url);
  $data = curl_exec ($ch);
  curl_close($ch);

Now for parsing the $data you use DOM. The DOM parser of PHP is capable to go through html code just like a real browser does it. With simple strstr/substr/regex you'd have a extremely hard time to get the various google results/ranks, I've tried it and was not successful on that.

Google stores the results in

  • tags (that changes from time to time, keep up to date)

    $dom = new domDocument;
    $dom->strictErrorChecking = false;
    $dom->preserveWhiteSpace = true;        
    @$dom->loadHTML($htmdata);
    $lists=$dom->getElementsByTagName('li');
    $results=array();
    foreach ($lists as $list)   
    {
        // now go through the <li> nodes and get the content
        // if you are stuck check the php code at google-rank-checker.squabbel.com, it contains a working function
    }
    

    You have a lot of work ahead. Make sure you don't flood Google with requests, make sure to detect captchas in case your script goes "insane". Use proper proxies, check the article and code I mentioned for details on that.

    Upvotes: 1

  • Related Questions