Reputation: 75
Setting up curl like this:
$ch = curl_init();
curl_setopt($ch,CURLOPT_URL,$this->domain);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,3);
curl_setopt($ch,CURLOPT_FAILONERROR,TRUE);
curl_setopt($ch,CURLOPT_USERAGENT,"Useragent");
curl_setopt($ch,CURLOPT_FOLLOWLOCATION,TRUE);
curl_setopt($ch,CURLOPT_MAXREDIRS,1);
$str = curl_exec($ch);
return $str;
$str = $this->cURL();
Pass the url to an html page and all is well - but pass a link direct to a .jpg for example and it returns a load of garbled data.
I'd like to ensure that if a page, say, redirects to a .jpg or .gif, etc - it's ignored and only html pages are returned.
I can't seem to find a setopt for curl that does this.
Any ideas?
-The Swan.
Upvotes: 0
Views: 967
Reputation: 360762
Curl doesn't care if the content's text (html) or binary garbage (a jpg), it'll just return what you tell it to fetch. You've told curl to follow redirects with the "CURLOPT_FOLLOWLOCATION" option, so it'll just follow the chain of redirects until it hits the regular limit, or gets something to download
If you don't know what the URL might contain ahead of time, you'd have to do some workarounds, such as issuing a custom HEAD
request, which would return the URL's normal http headers, from which you can extract the mime type (Content-type: ...
) of the response and decide if you want to fetch it.
Or just fetch the URL and then keep/toss the data based on the mime type in the full response's headers.
Upvotes: 1
Reputation: 11779
My idea - use HEAD request, check if content-type is interesting( eg. another HTML ) and after this make GET request for data.
set CURLOPT_NOBODY for HEAD request
Upvotes: 0