Reputation: 5232
I want to parse a lot of URLs to only get their status codes.
So what I did is:
$handle = curl_init($url -> loc);
curl_setopt($handle, CURLOPT_RETURNTRANSFER, true);
curl_setopt($handle, CURLOPT_HEADER , true); // we want headers
curl_setopt($handle, CURLOPT_NOBODY , true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
$response = curl_exec($handle);
$httpCode = curl_getinfo($handle, CURLINFO_HTTP_CODE);
curl_close($handle);
But as soon as the "nobody"-option is set to true, the returned status codes are incorrect (google.com returns 302, other sites return 303).
Setting this option to false is not possible because of the performance loss.
Any ideas?
Upvotes: 2
Views: 1995
Reputation: 20654
The default HTTP request method for curl is GET
. If you want only the response headers, you can use the HTTP method HEAD
.
curl_setopt($handle, CURLOPT_CUSTOMREQUEST, 'HEAD');
According to @Dai's answer, the NOBODY is already using the HEAD method. So the above method will not work.
Another option would be to use fsockopen
to open a connection, write the headers using fwrite
. Read the response using fgets
until the first occurrence of \r\n\r\n
to get the complete header. Since you need only the status code, you just need to read the first 13 characters.
<?php
$fp = fsockopen("www.google.com", 80, $errno, $errstr, 30);
if ($fp) {
$out = "GET / HTTP/1.1\r\n";
$out .= "Host: www.google.com\r\n";
$out .= "Accept-Encoding: gzip, deflate, sdch\r\n";
$out .= "Accept-Language: en-GB,en-US;q=0.8,en;q=0.6\r\n";
$out .= "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36\r\n";
$out .= "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8\r\n";
$out .= "Connection: Close\r\n\r\n";
fwrite($fp, $out);
$tmp = explode(' ', fgets($fp, 13));
echo $tmp[1];
fclose($fp);
}
Upvotes: 2
Reputation: 155205
cURL's nobody
option has it use the HEAD
HTTP verb, I'd wager the majority of non-static web applications I the wild don't handle this verb correctly, hence the problems you're seeing with different results. I suggest making a normal GET
request and discarding the response.
Upvotes: 1
Reputation:
i suggest get_headers()
instead:
<?php
$url = 'http://www.example.com';
print_r(get_headers($url));
print_r(get_headers($url, 1));
?>
Upvotes: 0