Reputation: 137
My c++ code use boost to convert encoding.
If I compile and run the codes on cygwin, it works OK, but if I compile codes directly on windows command line (cmd) with mingw-w64 or msvc11, the following code throw invalid_charset_error.
boost::locale::conv::between( encheckbeg, encheckend, consoleEncoding,
getCodingName(codingMethod) )
encheckbeg and encheckend are pointers point to char. consoleEncoding is a c-string, it can be "Big5" or "UTF-8". getCodingName return c-string, its content is charset name.
When getCodingName return "UTF-16LE" "UTF-16BE", I got exception. Other chaset names like "Big5" "GB18030" "UTF-8", I had tested these names, boost::locale::conv::between can recognize them. So I believed the problem is on UTF-16.
Is boost's charset conversion dependent on OS locale mechanism, so above problem appears? Why not using ICU convert UTF-16? And how do I solve this problem?
Upvotes: 1
Views: 1476
Reputation: 3024
Boost Locale is not a header-only library. There are 3 implementations:
The wconv is default choice when you build Boost Locale with MSVC.
Unfortunately, the windows APIs, such as MultiByteToWideChar, that used to perform the conversion does not support UTF-16(You may take a look at the API description. I think the reason is wchar_t(LPWSTR)
is UTF-16 already...)
A possible solution is add extra code for UTF-16, for example:
std::string mbcs = std::string("...");
std::wstring wstr = boost::locale::conv::to_utf<wchar_t>(mbcs,"Big5");//for Big5/GBK...
//wstr = boost::locale::conv::utf_to_utf<wchar_t>(utf8str);//for UTF-8
std::wstring_convert<std::codecvt_utf16<wchar_t>> utf16conv;//for UTF-16BE
//std::wstring_convert<std::codecvt_utf16<wchar_t, 0x10ffff, little_endian>> utf16conv;//for UTF-16LE
std::string utf16str = utf16conv.to_bytes(wstr);
Of course, you can also build Boost Locale using ICU. Just remember to build it first and deliver required runtime library/files with your program.
Upvotes: 1