Reputation: 12044
I use file_get_contents
to fetch remote pages.
Many of pages return 404 error, with a customized (and heavy 404 page)
Is there a way to stop and not download the whole page when 404 header is found?
(maybe curl or wget can do that ?)
Upvotes: 0
Views: 322
Reputation: 3372
I would do the following:
$pageUrl = "http://www.example.com/myfile/which/may/not.exist";
$headers = get_headers($pageUrl);
//check header before downloading
if($headers[0] == "HTTP/1.1 200 OK"){
//OK - download
$download = file_get_contents($pageUrl);
}else if($headers[0] == "HTTP/1.1 404 NOT FOUND"){
//NOT OK - show error
}
you could also do a indexof instead.
based on PHPs manual page for get_headers
Sample output:
Array
(
[0] => HTTP/1.1 200 OK
[1] => Date: Sat, 29 May 2004 12:28:13 GMT
[2] => Server: Apache/1.3.27 (Unix) (Red-Hat/Linux)
[3] => Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT
[4] => ETag: "3f80f-1b6-3e1cb03b"
[5] => Accept-Ranges: bytes
[6] => Content-Length: 438
[7] => Connection: close
[8] => Content-Type: text/html
)
Upvotes: 0
Reputation: 943591
No, this isn't possible.
HTTP provides some scope for conditional requests (such as If-Modified-Since
), but none that trigger on the status code.
The closest you could come would be to make a HEAD
request and then, if you don't get an error code back, make a GET
request afterwards. You'd probably lose more to having two requests for every good resource than you would gain in not getting the bodies of bad resources.
Upvotes: 2