a_good_swan
a_good_swan

Reputation: 75

Making PHP cURL skip binary data like images, video, etc

Setting up curl like this:

  $ch = curl_init();
      curl_setopt($ch,CURLOPT_URL,$this->domain);
      curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
      curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,3);
      curl_setopt($ch,CURLOPT_FAILONERROR,TRUE);
      curl_setopt($ch,CURLOPT_USERAGENT,"Useragent");
      curl_setopt($ch,CURLOPT_FOLLOWLOCATION,TRUE);
      curl_setopt($ch,CURLOPT_MAXREDIRS,1);
      $str = curl_exec($ch); 
      return $str;   
      $str = $this->cURL();

Pass the url to an html page and all is well - but pass a link direct to a .jpg for example and it returns a load of garbled data.

I'd like to ensure that if a page, say, redirects to a .jpg or .gif, etc - it's ignored and only html pages are returned.

I can't seem to find a setopt for curl that does this.

Any ideas?

-The Swan.

Upvotes: 0

Views: 967

Answers (2)

Marc B
Marc B

Reputation: 360762

Curl doesn't care if the content's text (html) or binary garbage (a jpg), it'll just return what you tell it to fetch. You've told curl to follow redirects with the "CURLOPT_FOLLOWLOCATION" option, so it'll just follow the chain of redirects until it hits the regular limit, or gets something to download

If you don't know what the URL might contain ahead of time, you'd have to do some workarounds, such as issuing a custom HEAD request, which would return the URL's normal http headers, from which you can extract the mime type (Content-type: ...) of the response and decide if you want to fetch it.

Or just fetch the URL and then keep/toss the data based on the mime type in the full response's headers.

Upvotes: 1

SergeS
SergeS

Reputation: 11779

My idea - use HEAD request, check if content-type is interesting( eg. another HTML ) and after this make GET request for data.

set CURLOPT_NOBODY for HEAD request

Upvotes: 0

Related Questions