Reputation: 3616

API for When Google crawled my site last, give a URL?

I have a bunch of URLs which are currently indexed in Google. Given those URLs, is there a way to figure out when was the last time Google crawled them ?

Manually, if i check the link in Google and check the 'cached' link, I see the date on when it was crawled. Is there a way to do this automatically ? A Google API of some sort ?

Thank you :)

Upvotes: 1

Answers (3)

kussberg

Reputation: 559

<?php

$domain_name = $_GET["url"];

//get googlebot last access
function googlebot_lastaccess($domain_name)
{
    $request = 'http://webcache.googleusercontent.com/search?hl=en&q=cache:'.$domain_name.'&btnG=Google+Search&meta=';
    $data = getPageData($request);
    $spl=explode("as it appeared on",$data);
   //echo "<pre>".$spl[0]."</pre>";
    $spl2=explode(".<br>",$spl[1]);
    $value=trim($spl2[0]);
   //echo "<pre>".$spl2[0]."</pre>";
    if(strlen($value)==0)
    {
        return(0);
    }
    else
    {
        return($value);
    }      
} 

$content = googlebot_lastaccess($domain_name);
$date = substr($content , 0, strpos($content, 'GMT') + strlen('GMT'));
echo "Googlebot last access = ".$date."<br />"; 

function getPageData($url) {
 if(function_exists('curl_init')) {
 $ch = curl_init($url); // initialize curl with given url
 curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']); // add useragent
 curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // write the response to a variable
 if((ini_get('open_basedir') == '') && (ini_get('safe_mode') == 'Off')) {
 curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); // follow redirects if any
 }
 curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5); // max. seconds to execute
 curl_setopt($ch, CURLOPT_FAILONERROR, 1); // stop when it encounters an error
 return @curl_exec($ch);
 }
 else {
 return @file_get_contents($url);
 }
}
?>

Just upload this PHP and create a Cron-Job. You can test it as following .../bot.php/url=http://www....

Upvotes: 2

user1269100

Reputation: 29

You can check the google bot last visit using the link http://www.gbotvisit.com/

Upvotes: 0

eywu

Reputation: 2724

Google doesn't provide an API for this type of data. The best way of tracking last crawled information is to mine your server logs.

In your server logs, you should be able to identify Googlebot by it's typical user-agent: Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html). Then you can see what URLs Googlebot has crawled, and when.

If you want to be sure it's Googlebot crawling those pages you can verify it with a Reverse DNS lookup.. Bingbot also supports Reverse DNS lookups.

If you don't want to manually parse your server logs, you can always use something like splunk or logstash. Both are great log processing platforms.

Also note, that the "cached" date in the SERPs doesn't always necessarily match the last crawled date. Googlebot can crawl your pages multiple times after the "cached" date, but not update their cached version. You can think of "cached date" as more of a "last indexed" date, but that's not exactly correct either. In either case, if you ever need to get a page re-indexed, you can always use Google Webmaster Tools (GWT). There's an option in GWT to force Googlebot to re-crawl a page, and also re-index a page. There's a weekly limit of 50 or something like that.

Upvotes: 3

API for When Google crawled my site last, give a URL?

Answers (3)

Related Questions