Reputation: 2057
I have a Visual C++ application on Visual Studio 9.0 . We have built the application using "Unicode character Set" as the Character Set . We are using the windows API A2T for conversions , however Multibyte characters(korean text) are not getting converted correctly . I saw the code for A2T , and its using CP_THREAD_ACP as the first parameter to MultiByteToWideChar . When I used CP_UTF8 for the same API I get the correct results . As per the comment for CP_THREAD_ACP it mentions "current thread's ANSI code page " I am not sure even though I have built the code with Unicode Character Set why the A2T function does not use UTF8
Upvotes: 1
Views: 1766
Reputation: 88155
The A2T
macro is for converting a string encoded using relevant ANSI code page to a TCHAR
string using the relevant tchar encoding. Since you've enabled 'Unicode character Set', TCHAR
is wchar_t
and the encoding is UTF-16. So the A2T
macro converts strings from an ANSI code page to UTF-16. (If you set the program to use ANSI instead of Unicode then TCHAR
is char and the encoding is the ANSI code page encoding and the A2T
macro should become a no-op.)
It is evident from the fact that using CP_UTF8
produces the correct conversion that your strings are not using an ANSI code page and instead are using UTF-8. The ANSI code page cannot be set to UTF-8 and therefore A2T
is not the appropriate method for conversion.
You need to analyze what determines the encoding of the input strings in your program and what output encoding you need and then choose an appropriate conversion routine.
Note that you're not just looking for a routine that converts between the encodings being used right now on your machine with the program in the particular configuration you happen to be using. You're looking for a routine that will use the appropriate encoding under any supported configuration on any supported machine. I.e. the routine you choose needs to change the conversion it performs depending on the program's and the machine's configuration. For example the TCHAR
based functions and macros can use different encodings based on how the program is configured, but they all always work with each other because they all always use a consistent TCHAR
encoding, whatever that happens to be in any given configuration.
Upvotes: 2
Reputation: 106530
The "unicode character set" means that Windows APIs use wchar_t
, and communicate with your program using UTF-16. If your program uses "narrow" char
strings, you must perform the conversion to convert from whatever character set you input into UTF-16.
"unicode character set" does not cause anything to be interpreted as UTF-8.
Upvotes: 1