Reputation: 633
Downloading an image using cURL
https://cdni.rt.com/deutsch/images/2018.04/article/5ac34e500d0403503d8b4568.jpg
when saving this image manually from the browser to the local pc, the size shown by the system is 139,880 bytes
When downloading it using cURL, the file seems to be damaged and does not get considered as a valid image
its size, when downloaded using cURL, is 139,845 which is lower than the size when downloading it manually
digging the issue further, found that the server is returning the content length in the response headers as
content-length: 139845
This length is identical to what cURL downloaded, so I suspected that cURL closes the transfer once reached the alleged (possibly wrong) length by the server
Is there any way to make cURL download the file completely even if the content-length header is wrong
Used code:
//curl ini
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER,0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($ch, CURLOPT_TIMEOUT,20);
curl_setopt($ch, CURLOPT_REFERER, 'http://www.bing.com/');
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.8) Gecko/2009032609 Firefox/3.0.8');
curl_setopt($ch, CURLOPT_MAXREDIRS, 5); // Good leeway for redirections.
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); // Many login forms redirect at least once.
curl_setopt($ch, CURLOPT_COOKIEJAR , "cookie.txt");
//curl get
$x='error';
$url='https://cdni.rt.com/deutsch/images/2018.04/article/5ac34e500d0403503d8b4568.jpg';
curl_setopt($ch, CURLOPT_HTTPGET, 1);
curl_setopt($ch, CURLOPT_URL, trim($url));
$exec=curl_exec($ch);
$x=curl_error($ch);
$fp = fopen('test.jpg','x');
fwrite($fp, $exec);
fclose($fp);
Upvotes: 0
Views: 2677
Reputation: 21665
the server has a bugged implementation of Accept-Encoding
compressed transfer mechanism.
the response is ALWAYS gzip-compressed, but won't tell the client that it's gzip-compressed unless the client has the Accept-Encoding: gzip
header in the request. when the server doesn't tell the client that it's gzipped, the client won't gzip-decompress it before saving it, thus your corrupted download. tell curl to offer gzip compression by setting CURLOPT_ENCODING
,
curl_setopt($ch,CURLOPT_ENCODING,'gzip');
, then the server will tell curl that it's gzip-compressed, and curl will decompress it for you, before giving it to PHP.
you should probably tell the server admin about this, it's a serious bug in his web server, corrupting downloads.
Upvotes: 2
Reputation: 21665
libcurl has an option for that called CURLOPT_IGNORE_CONTENT_LENGTH
, unfortunately this is not natively supported in php, but you can trick php into setting the option anyway, by using the correct magic number (which, at least on my system is 136),
if(!defined('CURLOPT_IGNORE_CONTENT_LENGTH')){
define('CURLOPT_IGNORE_CONTENT_LENGTH',136);
}
if(!curl_setopt($ch,CURLOPT_IGNORE_CONTENT_LENGTH,1)){
throw new \RuntimeException('failed to set CURLOPT_IGNORE_CONTENT_LENGTH! - '.curl_errno($ch).': '.curl_error($ch));
}
you can find the correct number for your system by compiling and running the following c++ code:
#include <iostream>
#include <curl/curl.h>
int main(){
std::cout << CURLOPT_IGNORE_CONTENT_LENGTH << std::endl;
}
Upvotes: 0