Shan-Hung Hsu
Shan-Hung Hsu

Reputation: 137

Why can't I convert UTF-16 text to other encoding on windows using boost::locale::conv::between

My c++ code use boost to convert encoding.

If I compile and run the codes on cygwin, it works OK, but if I compile codes directly on windows command line (cmd) with mingw-w64 or msvc11, the following code throw invalid_charset_error.

boost::locale::conv::between( encheckbeg, encheckend, consoleEncoding,
    getCodingName(codingMethod) )

encheckbeg and encheckend are pointers point to char. consoleEncoding is a c-string, it can be "Big5" or "UTF-8". getCodingName return c-string, its content is charset name.

When getCodingName return "UTF-16LE" "UTF-16BE", I got exception. Other chaset names like "Big5" "GB18030" "UTF-8", I had tested these names, boost::locale::conv::between can recognize them. So I believed the problem is on UTF-16.

Is boost's charset conversion dependent on OS locale mechanism, so above problem appears? Why not using ICU convert UTF-16? And how do I solve this problem?

Upvotes: 1

Views: 1476

Answers (1)

Beck Yang
Beck Yang

Reputation: 3024

Boost Locale is not a header-only library. There are 3 implementations:

  • ICU: use ICU4C library
  • iconv: use iconv library
  • wconv: use Windows API

The wconv is default choice when you build Boost Locale with MSVC. Unfortunately, the windows APIs, such as MultiByteToWideChar, that used to perform the conversion does not support UTF-16(You may take a look at the API description. I think the reason is wchar_t(LPWSTR) is UTF-16 already...)

A possible solution is add extra code for UTF-16, for example:

std::string mbcs = std::string("...");
std::wstring wstr = boost::locale::conv::to_utf<wchar_t>(mbcs,"Big5");//for Big5/GBK...
//wstr = boost::locale::conv::utf_to_utf<wchar_t>(utf8str);//for UTF-8
std::wstring_convert<std::codecvt_utf16<wchar_t>> utf16conv;//for UTF-16BE
//std::wstring_convert<std::codecvt_utf16<wchar_t, 0x10ffff, little_endian>> utf16conv;//for UTF-16LE
std::string utf16str = utf16conv.to_bytes(wstr);

Of course, you can also build Boost Locale using ICU. Just remember to build it first and deliver required runtime library/files with your program.

Upvotes: 1

Related Questions