Reputation: 7681
I want to download a compressed file from a URL using libcurl C API. I have the following code:
// CurlGet.h
#include <cstddef>
#include <cstdlib>
#include <cstring>
#include <curl/curl.h>
struct memory {
char *response;
size_t size;
};
size_t callBackWrite(void *data, size_t size, size_t nmemb, void *userp) {
size_t written = fwrite(data, size, nmemb, (FILE *) userp);
return written;
}
int curlGetC(const char *url, const char* output_filename) {
CURL *curl_handle;
curl_global_init(CURL_GLOBAL_ALL);
/* init the curl session */
curl_handle = curl_easy_init();
if (!curl_handle) {
throw std::logic_error("You no curl");
}
/* set URL to get here */
curl_easy_setopt(curl_handle, CURLOPT_URL, url);
/* Switch on full protocol/debug output while testing */
curl_easy_setopt(curl_handle, CURLOPT_VERBOSE, 1L);
/* disable progress meter, set to 0L to enable it */
curl_easy_setopt(curl_handle, CURLOPT_NOPROGRESS, 0L);
/* send all data to this function */
curl_easy_setopt(curl_handle, CURLOPT_WRITEFUNCTION, callBackWrite);
/* open the file */
FILE *f = fopen(output_filename, "wb");
if (!f) {
throw std::invalid_argument("You no got file");
}
/* write the page body to this file handle */
curl_easy_setopt(curl_handle, CURLOPT_WRITEDATA, f);
/* get it! */
curl_easy_perform(curl_handle);
/* close the header file */
fclose(f);
/* cleanup curl stuff */
curl_easy_cleanup(curl_handle);
curl_global_cleanup();
return 0;
}
Then using this code to download a web page works as expected but downloading an omex
file (which is actually just a zip
file with the omex extension name) does not:
#include "CurlGet.h"
#include <iostream>
// works as expected
std::string url1 = "https://isocpp.org/wiki/faq/mixing-c-and-cpp";
std::string output_filename1 = "/mnt/d/libsemsim/semsim/example.html";
curlGetC(url1_.c_str(), output_filename1_.c_str());
// downloaded file is 0 bytes.
std::string url2 = "https://auckland.figshare.com/ndownloader/files/17432333";
std::string output_filename2 = "/mnt/d/libsemsim/semsim/example.omex";
curlGetC(url2_.c_str(), output_filename2_.c_str());
Could anybody suggest how to modify my code to get it to download the compressed file?
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying 52.48.88.255...
* TCP_NODELAY set
* Connected to auckland.figshare.com (52.48.88.255) port 443 (#0)
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/ssl/certs/ca-certificates.crt
CApath: /etc/ssl/certs
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use http/1.1
* Server certificate:
* subject: C=GB; L=London; O=figshare LLP; CN=*.figshare.com
* start date: Mar 20 00:00:00 2019 GMT
* expire date: Jul 9 12:00:00 2020 GMT
* subjectAltName: host "auckland.figshare.com" matched cert's "*.figshare.com"
* issuer: C=US; O=DigiCert Inc; CN=DigiCert SHA2 Secure Server CA
* SSL certificate verify ok.
> GET /ndownloader/files/17432333 HTTP/1.1
Host: auckland.figshare.com
Accept: */*
< HTTP/1.1 302 Found
< Date: Sun, 12 Apr 2020 10:43:10 GMT
< Content-Type: application/octet-stream
< Content-Length: 0
< Connection: keep-alive
< Server: nginx
< X-Storage-Protocol: https
< X-Filename: BIOMD0000000204_new.omex
< Location: https://objectext.auckland.ac.nz/figshare/17432333/BIOMD0000000204_new.omex
< X-Storage-Host: objectext.auckland.ac.nz
< X-Storage-File: 17432333/BIOMD0000000204_new.omex
< X-Storage-Bucket: figshare
< Content-Disposition: attachment;filename=BIOMD0000000204_new.omex
< Cache-Control: no-cache, no-store
< Set-Cookie: fig_tracker_client=0975a192-4ec5-4a63-a800-c598eb7ca6b5; Max-Age=31536000; Path=/; expires=Mon, 12-Apr-2021 10:43:10 GMT; secure; HttpOnly
< X-Robots-Tag: noindex
< X-Frame-Options: SAMEORIGIN
< X-XSS-Protection: 1; mode=block
< Strict-Transport-Security: max-age=31536000; includeSubDomains;
< Cache-Control: public, must-revalidate, proxy-revalidate
< Access-Control-Allow-Credentials: true
< Access-Control-Allow-Methods: GET, OPTIONS
< Access-Control-Allow-Headers: Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Authorization,Range
< Access-Control-Expose-Headers: Location,Accept-Ranges,Content-Encoding,Content-Length,Content-Range
<
0 0 0 0 0 0 0 0 --:--:-- 0:00:01 --:--:-- 0
* Connection #0 to host auckland.figshare.com left intact
Upvotes: 0
Views: 633
Reputation: 241721
This really has nothing to do with the fact that the traget file is compressed. Zip files are archives whose components are compressed individually; it is not possible to decompress a zip file into a single meaningful object. That's different from gzipped tar archives, for example. (However it is not generally desirable for a user agent to automatically decompress a .tgz file into a .tar file, even though it could.)
Your problem stems from the fact that you didn't provide the full URI for the file. The web server responded by sending a redirect (302) return code. That tells the user agent to make a new request for the resource, using the URI provided in the Location response header.
You need to tell libcurl to follow redirects.
curl_easy_setopt(curl_handle, CURLOPT_FOLLOWLOCATION, 1L);
302 redirects differ from 301 redirects in that the redirection is marked as temporary. The 301 return code suggests to the user agent that it should remember the redirection and not attempt to use the original URL in the future. A 302 response should not be cached; it might, for example, be used to provide the location of what is currently the most recent version of a resource.
Upvotes: 1
Reputation: 21483
here is (probably) what happened:
You sent a request without the Accept-Encoding
header, the server (foolishly, imo) assumed that since you didn't specify any specific transfer encodings, you probably support gzip.. (sounds stupid, i know, but the proper way to say "i dont support any transfer encodings" is to send the header Accept-Encoding: identity
, but you didn't do that), and the server decided to answer with Content-Encoding: gzip
, which your code ignored. what happens next is that gzip-compressed data was saved in your "output_filename".
to tell curl to automatically deal with encodings (which is the easiest solution, the vast majority of the time), just set CURLOPT_ACCEPT_ENCODING
to emptystring, this tells curl to attempt to do the tansfer compressed, and automatically decompress the response before writing it:
curl_easy_setopt(curl_handle, CURLOPT_ACCEPT_ENCODING, "");
that should fix your problem. now curl will send a header looking like Accept-Encoding: gzip, deflate, br
(the exact compression algorithms sent will depend on what your libcurl was compiled to support), and the server will choose 1 of those encodings, or if the server doesn't support any of the encodings your libcurl supports, the server should send the data uncompressed,
and curl in turn will auto-decompress the data before sending it to CURLOPT_WRITEFUNCTION
you can find relevant documentation here: https://curl.haxx.se/libcurl/c/CURLOPT_ACCEPT_ENCODING.html
Upvotes: 0