Reputation: 126
I'm trying to display a Chinese text in the MessageBoxW. But I can't correctly convert it from UTF-8 to wchar_t. At the same time, the original wchar_t Chinese is displayed correctly.
I played with different MultiByteToWideChar flags but with the same result. What the reason of the incorrect conversion?
Upvotes: 0
Views: 790
Reputation: 177610
char text[] = "文本"
is only UTF-8 if the source file is encoded in UTF-8. Since your title string displays correctly your encoding is the default Chinese legacy encoding on Windows, and the text
string contains bytes in that encoding, and not UTF-8, so MultiByteToWideChar
fails. You can see that the function returns zero if you set the flag to check for invalid characters, which happens if it isn't really UTF-8:
int ret = MultiByteToWideChar(CP_UTF8, MB_ERR_INVALID_CHARS, text, -1, wtext, 1000);
The Microsoft compiler has options to specify source and execution character set, and a /utf-8
option (recommended):
/source-charset:<iana-name>|.nnnn set source character set
/execution-charset:<iana-name>|.nnnn set execution character set
/utf-8 set source and execution character set to UTF-8
Multiple options to fix. #2 and #3 assume the Microsoft compiler. Other compilers may vary.
char text[] = u8"文本";
since your existing default encoding supports Chinese. The source characters will be interpreted in that encoding and then re-encoded in UTF-8 with this notation. If the source is sent to someone with different OS default encoding, it will not work for them./utf-8
was specified. text
will contain UTF-8 bytes. Title will display correctly./utf-8
switch to inform the compiler to decode the source as UTF-8 instead of the default encoding.Example of #4 that will compile correctly no matter the OS default encoding:
#include <windows.h>
int main() {
char text[] = "\xe6\x96\x87\xe6\x9c\xac";
wchar_t wtext[1000];
MultiByteToWideChar(CP_UTF8, 0, text, -1, wtext, 1000);
MessageBoxW(NULL, wtext, L"\u6a19\u984c", MB_OK);
return 0;
}
Upvotes: 1