Reputation: 51
I have read many question regarding the title. Basically I'm using combination of getheader and curl to check wether a url is exist.
$url = "http://www.asdkkk.com";
$headers = get_headers($url);
if(strpos($headers[0],'404') === false){
$ch = curl_init($url);
curl_setopt_array($ch,array(
CURLOPT_HEADER => true,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_SSL_VERIFYPEER => false,
CURLOPT_HTTPHEADER => array("Accept-Language: en-US;q=0.6,en;q=0.4"),
CURLOPT_USERAGENT => 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.6 (KHTML, like Gecko) Chrome/16.0.897.0 Safari/535.6'
));
$data = curl_exec($ch);
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
if($httpCode != 404){
curl_close($ch);
return $data;
}
}else{
echo "URL Not Exists";
}
Both function will return status code 200 for the url("http://www.asdkkk.com"). In the url is a page not found website. But it seem like it is hosted and the header of the page doesn't set to 404. I have try out not only this url but others too. So how can I determine a URL is actually existence in a very accurate way?
Upvotes: 0
Views: 635
Reputation: 889
I think the issue with your example code is you are confusing a 404 HTTP response code for 'Not Found' from a server with the case of a URL that doesn't point to any server at all. If there's no server response at all, cURL will return '0' as the HTTP response, rather than 404. Try running the below code and see if it works for your purposes:
$urls = array(
"http://www.asdkkk.com",
"http://www.google.com/cantfindthisurl",
"http://www.google.com",
);
$ch = curl_init();
foreach($urls as $url){
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_exec($ch);
$http_status = curl_getinfo($ch, CURLINFO_HTTP_CODE);
echo "$http_status for $url <br>";
}
Upvotes: 1