Reputation: 3364
I am writing an simple file downloader with a help of libcurl
. Here's the code for downloading the file from HTTP server:
static size_t WriteCallback(void *contents, size_t size, size_t nmemb, void *userp) {
((std::string*)userp)->append((char*)contents, size * nmemb);
return size * nmemb;
}
std::wstring result; //result with polish letters (ą, ę etc.)
CURL *curl;
CURLcode res;
std::string readBuffer;
curl = curl_easy_init();
ERROR_HANDLE(curl, L"CURL could not been inited.", MOD_INTERNET);
curl_easy_setopt(curl, CURLOPT_URL, url.c_str());
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, WriteCallback);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, &readBuffer);
curl_easy_setopt(curl, CURLOPT_SSL_VERIFYPEER, 0L);
curl_easy_setopt(curl, CURLOPT_SSL_VERIFYHOST, 0L);
curl_easy_setopt(curl, CURLOPT_HTTPAUTH, CURLAUTH_BASIC);
curl_easy_setopt(curl, CURLOPT_USERPWD, (login + ":" + password).c_str()); //e.g.: "login:password"
curl_easy_setopt(curl, CURLOPT_POST, true);
//curl_easy_setopt(curl, CURLOPT_ENCODING, "UTF-8"); //does not change anything
res = curl_easy_perform(curl);
curl_easy_cleanup(curl);
result = C::toWString(readBuffer);
return res == 0; //0 = OK
It works fine when the file I want to download is encoded as ANSI
(according to e.g. Notepad++). But when I try to download the UTF-8
file (UTF-8 without BOM
), I get an error with some characters (e.g. polish letters) due to encoding problem.
For example, I run the code for two files with the same text ("to jest teść to") and saved it to std::wstring
. The result
is from ANSI
file and result2
(problematic) from UTF-8
version:
Both files opened on server with e.g. Notepad++ displays the right text.
So, how can I get the UTF-8
file content with libcurl
and save it to std::wstring
with the proper encoding (so the debugger of Visual Studio will show it as to jest teść to
)?
Upvotes: 0
Views: 2582
Reputation: 596121
This is not a libcurl issue. You are storing the raw data in a std::string
and then converting that to a std::wstring
after the download is finished. You have to look at the charset reported in the HTTP response and decode the data to std::wstring
accordingly. C::toWString()
has no concept of charsets, so you should use something else, like ICONV or ICU. Or, if you know the data is always UTF-8, do the conversion manually (UTF conversions are easy to code by hand), or use C++11's built in UTF conversions using the std::wstring_convert
class.
Upvotes: 2
Reputation: 58004
libcurl won't convert or translate the contents for you. It will deliver the exact bytes to your application that the server sent out.
You can use HTTP Accept headers etc to affect what the server responds, but then you need to check the received charset and convert accordingly by yourself if you're not satisfied with what you get.
Upvotes: 1