sameer karjatkar
sameer karjatkar

Reputation: 2057

MultiByteToWideChar conversion

I have a Visual C++ application on Visual Studio 9.0 . We have built the application using "Unicode character Set" as the Character Set . We are using the windows API A2T for conversions , however Multibyte characters(korean text) are not getting converted correctly . I saw the code for A2T , and its using CP_THREAD_ACP as the first parameter to MultiByteToWideChar . When I used CP_UTF8 for the same API I get the correct results . As per the comment for CP_THREAD_ACP it mentions "current thread's ANSI code page " I am not sure even though I have built the code with Unicode Character Set why the A2T function does not use UTF8

Upvotes: 1

Views: 1766

Answers (2)

bames53
bames53

Reputation: 88155

The A2T macro is for converting a string encoded using relevant ANSI code page to a TCHAR string using the relevant tchar encoding. Since you've enabled 'Unicode character Set', TCHAR is wchar_t and the encoding is UTF-16. So the A2T macro converts strings from an ANSI code page to UTF-16. (If you set the program to use ANSI instead of Unicode then TCHAR is char and the encoding is the ANSI code page encoding and the A2T macro should become a no-op.)

It is evident from the fact that using CP_UTF8 produces the correct conversion that your strings are not using an ANSI code page and instead are using UTF-8. The ANSI code page cannot be set to UTF-8 and therefore A2T is not the appropriate method for conversion.


You need to analyze what determines the encoding of the input strings in your program and what output encoding you need and then choose an appropriate conversion routine.

Note that you're not just looking for a routine that converts between the encodings being used right now on your machine with the program in the particular configuration you happen to be using. You're looking for a routine that will use the appropriate encoding under any supported configuration on any supported machine. I.e. the routine you choose needs to change the conversion it performs depending on the program's and the machine's configuration. For example the TCHAR based functions and macros can use different encodings based on how the program is configured, but they all always work with each other because they all always use a consistent TCHAR encoding, whatever that happens to be in any given configuration.

Upvotes: 2

Billy ONeal
Billy ONeal

Reputation: 106530

The "unicode character set" means that Windows APIs use wchar_t, and communicate with your program using UTF-16. If your program uses "narrow" char strings, you must perform the conversion to convert from whatever character set you input into UTF-16.

"unicode character set" does not cause anything to be interpreted as UTF-8.

Upvotes: 1

Related Questions