Dmitry
Dmitry

Reputation: 126

Chinese conversion in MultiByteToWideChar

I'm trying to display a Chinese text in the MessageBoxW. But I can't correctly convert it from UTF-8 to wchar_t. At the same time, the original wchar_t Chinese is displayed correctly. I played with different MultiByteToWideChar flags but with the same result. What the reason of the incorrect conversion? enter image description here

Upvotes: 0

Views: 790

Answers (1)

Mark Tolonen
Mark Tolonen

Reputation: 177610

char text[] = "文本" is only UTF-8 if the source file is encoded in UTF-8. Since your title string displays correctly your encoding is the default Chinese legacy encoding on Windows, and the text string contains bytes in that encoding, and not UTF-8, so MultiByteToWideChar fails. You can see that the function returns zero if you set the flag to check for invalid characters, which happens if it isn't really UTF-8:

int ret = MultiByteToWideChar(CP_UTF8, MB_ERR_INVALID_CHARS, text, -1, wtext, 1000);

The Microsoft compiler has options to specify source and execution character set, and a /utf-8 option (recommended):

/source-charset:<iana-name>|.nnnn      set source character set  
/execution-charset:<iana-name>|.nnnn   set execution character set  
/utf-8                                 set source and execution character set to UTF-8

Multiple options to fix. #2 and #3 assume the Microsoft compiler. Other compilers may vary.

  1. Use char text[] = u8"文本"; since your existing default encoding supports Chinese. The source characters will be interpreted in that encoding and then re-encoded in UTF-8 with this notation. If the source is sent to someone with different OS default encoding, it will not work for them.
  2. Re-save the source as UTF-8 w/ BOM. The MS compiler will detect the BOM (byte order mark used as a UTF-8 signature) and process the source as if /utf-8 was specified. text will contain UTF-8 bytes. Title will display correctly.
  3. Re-save as UTF-8 (no BOM) and compile with the /utf-8 switch to inform the compiler to decode the source as UTF-8 instead of the default encoding.
  4. Use ASCII-only source and escape codes to specify the Chinese character explicitly.

Example of #4 that will compile correctly no matter the OS default encoding:

#include <windows.h>

int main() {
    char text[] = "\xe6\x96\x87\xe6\x9c\xac";
    wchar_t wtext[1000];
    MultiByteToWideChar(CP_UTF8, 0, text, -1, wtext, 1000);
    MessageBoxW(NULL, wtext, L"\u6a19\u984c", MB_OK);
    return 0;
}

Upvotes: 1

Related Questions