Reputation: 415
I'm looking to cURL a URL and keep track of each individual URL it goes through. For some reason I am unable to accomplish this without doing recursive cURL calls which is not ideal. Perhaps I am missing some easy option. Thoughts?
$url = "some url with redirects";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_NOBODY, false);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1) Gecko/20061024 BonEcho/2.0");
$html = curl_exec($ch);
$info = array();
if(!curl_errno($ch))
{
$info = curl_getinfo($ch);
echo "<pre>";
print_r($info);
echo "</pre>";
}
and I get a response like this
Array
(
[url] => THE LAST URL THAT WAS HIT
[content_type] => text/html; charset=utf-8
[http_code] => 200
[header_size] => 1942
[request_size] => 1047
[filetime] => -1
[ssl_verify_result] => 0
[redirect_count] => 2 <---- I WANT THESE
[total_time] => 0.799589
[namelookup_time] => 0.000741
[connect_time] => 0.104206
[pretransfer_time] => 0.104306
[size_upload] => 0
[size_download] => 49460
[speed_download] => 61856
[speed_upload] => 0
[download_content_length] => 49460
[upload_content_length] => 0
[starttransfer_time] => 0.280781
[redirect_time] => 0.400723
)
Upvotes: 11
Views: 9901
Reputation: 58254
With libcurl, you can use the CURLINFO_REDIRECT_URL getinfo variable to find out the URL it would have redirected to if it was enabled. This allows programs to easily traverse the redirects themselves.
This approach is much better and easier than the parsing of Location:
headers the others have suggested here, as then your code must rebuild relative paths etc. CURLINFO_REDIRECT_URL
fixes that for you automatically.
The PHP/CURL binding added support for this feature in PHP 5.3.7:
$url = curl_getinfo($ch, CURLINFO_REDIRECT_URL)
The commit that fixed this:
https://github.com/php/php-src/commit/689268a0ba4259c8f199cae6343b3d17cab9b6a5
Upvotes: 5
Reputation: 39
May I make a recommendation...
preg_match('/(Location:|URI:)(.*?)\n/', $httpheader, $matches);
change the regex to /(Location:|URI:)(.*?)\n/i so it's case insensitive. I noticed there are some sites/places that are using location: where the L is lower case.
Just a thought to help those that wondered why sometimes it's not working... look into that.
Upvotes: 3
Reputation: 43619
You have
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
This means that cURL will follow redirects and return you only the final page with no Location header.
To follow location manually:
function getWebPage($url, $redirectcallback = null){
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_NOBODY, false);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1) Gecko/20061024 BonEcho/2.0");
$html = curl_exec($ch);
$http_code = curl_getinfo($ch, CURLINFO_HTTP_CODE);
if ($http_code == 301 || $http_code == 302) {
list($httpheader) = explode("\r\n\r\n", $html, 2);
$matches = array();
preg_match('/(Location:|URI:)(.*?)\n/', $httpheader, $matches);
$nurl = trim(array_pop($matches));
$url_parsed = parse_url($nurl);
if (isset($url_parsed)) {
if($redirectcallback){ // callback
$redirectcallback($nurl, $url);
}
$html = getWebPage($nurl, $redirectcallback);
}
}
return $html;
}
function trackAllLocations($newUrl, $currentUrl){
echo $currentUrl.' ---> '.$newUrl."\r\n";
}
getWebPage('some url with redirects', 'trackAllLocations');
Upvotes: 13