Windows producing UTF8 sequence for ANSI version WM_CHAR? Why I cannot see it?

Question

In today(2023.01)'s MSDN https://learn.microsoft.com/en-us/windows/win32/inputdev/wm-char , Microsoft says that:

... Otherwise(using ANSI version of RegisterClass), the system provides characters in the current process code page, which can be set to UTF-8 in Windows Version 1903 (May 2019 Update) and newer.

But, I just can NOT see WM_CHAR presenting Unicode character in UTF8 sequence. Am I doing wrong, or the document is wrong/misleading?

I do the experiments on Win10.21H2, using Keyview2A.exe v1.8, which is based on Charles Petzold's Keyview2 demo program in his famous book Programming Windows 5th-ed (1998).

I'm trying on Win10.21H2 .

First, the non-UTF8ACP case to show that KeyviewA works OK.

I try to type a Chinese character 电, which is U+7535, and GBK encoding B5 E7.

Second, the UTF8ACP case does NOT give KeyviewA UTF8 sequence.

I just got 0x3F(?), sigh!

Third, what about those characters from SBCS?

SBCS = Single-byte character set. DBCS = Double-byte character set. MBCS = Multi-byte character set. (generic name for SBCS, DBCS and 3+byte character set)

Most European countries use such character set.

Type in some Russian letters:

Type in some Greek letters:

[20230121.c1] So far, I seem to have found out the rule about "enabling UTF8ACP", for an ANSI(narrow-char) program. Summarized below:

The IME produces Unicode value for any human-input character. When Windows need to send that character to KeyviewA, it does the following:

Check the HKL value for the target HWND. Memo: KeyviewA itself can query this HKL value by GetKeyboardLayout(0).
Get the ANSI-codepage associated the HKL value(lets call it curhkl). This can be acquired by curcodepage=GetLocaleInfo(LOWORD(curhkl), LOCALE_IDEFAULTANSICODEPAGE, ...);.
Call WideCharToMultiByte(curcodepage, ...) to convert the Unicode value to MBCS sequence.
- If the MBCS is a single byte(e.g. 0xE1), Windows sends one WM_CHAR message to Keyview2A with wParam=0xE1 .
- If the MBCS is two bytes(e.g. 0xB5 0xE7), then Windows sends two WM_CHAR messages to Keyview2A with wParam=0x3F, both.

Windows producing UTF8 sequence for ANSI version WM_CHAR? Why I cannot see it?

First, the non-UTF8ACP case to show that KeyviewA works OK.

Second, the UTF8ACP case does NOT give KeyviewA UTF8 sequence.

Third, what about those characters from SBCS?

Answers (1)

Related Questions