PolGraphic
PolGraphic

Reputation: 3364

Downloading UTF-8 file with libcurl (ANSI works fine)

I am writing an simple file downloader with a help of libcurl. Here's the code for downloading the file from HTTP server:

static size_t WriteCallback(void *contents, size_t size, size_t nmemb, void *userp) {
    ((std::string*)userp)->append((char*)contents, size * nmemb);
    return size * nmemb;
}

std::wstring result; //result with polish letters (ą, ę etc.)
CURL *curl;
CURLcode res;
std::string readBuffer;

curl = curl_easy_init();
ERROR_HANDLE(curl, L"CURL could not been inited.", MOD_INTERNET);
curl_easy_setopt(curl, CURLOPT_URL, url.c_str());
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, WriteCallback);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, &readBuffer);
curl_easy_setopt(curl, CURLOPT_SSL_VERIFYPEER, 0L);
curl_easy_setopt(curl, CURLOPT_SSL_VERIFYHOST, 0L);
curl_easy_setopt(curl, CURLOPT_HTTPAUTH, CURLAUTH_BASIC);
curl_easy_setopt(curl, CURLOPT_USERPWD, (login + ":" + password).c_str()); //e.g.: "login:password"
curl_easy_setopt(curl, CURLOPT_POST, true);
//curl_easy_setopt(curl, CURLOPT_ENCODING, "UTF-8"); //does not change anything
res = curl_easy_perform(curl);
curl_easy_cleanup(curl);

result = C::toWString(readBuffer);
return res == 0; //0 = OK

It works fine when the file I want to download is encoded as ANSI (according to e.g. Notepad++). But when I try to download the UTF-8 file (UTF-8 without BOM), I get an error with some characters (e.g. polish letters) due to encoding problem.

For example, I run the code for two files with the same text ("to jest teść to") and saved it to std::wstring. The result is from ANSI file and result2 (problematic) from UTF-8 version: enter image description here

Both files opened on server with e.g. Notepad++ displays the right text.

So, how can I get the UTF-8 file content with libcurl and save it to std::wstring with the proper encoding (so the debugger of Visual Studio will show it as to jest teść to)?

Upvotes: 0

Views: 2582

Answers (2)

Remy Lebeau
Remy Lebeau

Reputation: 596121

This is not a libcurl issue. You are storing the raw data in a std::string and then converting that to a std::wstring after the download is finished. You have to look at the charset reported in the HTTP response and decode the data to std::wstring accordingly. C::toWString() has no concept of charsets, so you should use something else, like ICONV or ICU. Or, if you know the data is always UTF-8, do the conversion manually (UTF conversions are easy to code by hand), or use C++11's built in UTF conversions using the std::wstring_convert class.

Upvotes: 2

Daniel Stenberg
Daniel Stenberg

Reputation: 58004

libcurl won't convert or translate the contents for you. It will deliver the exact bytes to your application that the server sent out.

You can use HTTP Accept headers etc to affect what the server responds, but then you need to check the received charset and convert accordingly by yourself if you're not satisfied with what you get.

Upvotes: 1

Related Questions