id48jkdl
id48jkdl

Reputation: 11

Win32 Resource Dialog Text - UTF-8 - Only displays first character of each string

I'm looking to move from ASCII to UTF-8 everywhere in my Windows Desktop (Win32/MFC) application. This is as opposed to doing the usual move to UTF-16. The idea being, fewer changes will need to be made, and interfacing with external systems that talk in UTF-8 will require less work.

The problem is that the static control and button in the dialog box from the resource file only ever displays the first character of its kanji text. Should resource files work just fine using UTF-8?

Dialog illustrating the problem

UTF-8 strings appear to be read and displayed correctly coming from the String Table in the resource file, but not text directly on dialogs themselves.

I am testing using kanji characters. How the dialog appears in the resource editor

I have:

Using UTF-8 everywhere means std::string, CStringA and the -A Win32 functions implicitly by using the "Advanced/Character Set" value of "Not Set". Additionally, the resource file is in UTF-8, including dialogs with their text, String Tables etc. If I set it to "Use Unicode Character Set", my understanding is that UTF-16 and -W functions will be the default everywhere - the standard Windows way of supporting Unicode historically.

The pragma appears to work, as the Resource Editor in Visual Studio does not clobber the .rc file into UTF-16LE. Also, the manifest appears to work as the MessageBox() (MessageBoxA) function displays text from the String Table correctly. Without the manifest, the MessageBox() displays question marks.

      TCHAR buffer[512];
      LoadString(hInst, IDS_TESTKANJI, buffer, 512 - 1);
      MessageBox(hWnd, buffer, _T("Caption"), MB_OK);

Successful message box

If I set the Character Encoding to "Use Unicode Character Set", everything appears to work as expected - all characters are displayed. Dialog successfully showing kanji

My suspicion is that the encoding is going UTF-8(.rc file) -> UTF-16(internal representation) -> ASCII (Dialog text loading?), meets a null character from the UTF-16 representation, and stops after reading the first character.

If I call SetDlgItemText() on my static control using text from the String Table, the static control will show all the characters correctly:

case WM_COMMAND:
   if (LOWORD(wParam) == IDOK)
   {
      TCHAR buffer[512];
      LoadString(hInst, IDS_TESTKANJI, buffer, 512 - 1);
      SetDlgItemText(hDlg, IDC_STATIC, buffer);
      ...

Upvotes: 0

Views: 710

Answers (2)

yurik42
yurik42

Reputation: 1

It does not stop after meeting the first character. It stops after reading as many bytes as UNICODE characters are in the string. I ran into a similar problem converting a legacy Multibyte app to UTF-8 encoding. My "adhoc" solution is to pad the strings with extra spaces (one space per non-ascii character) E.g. instead of "€100" use "€100 "

Upvotes: 0

id48jkdl
id48jkdl

Reputation: 11

It seems like the current answer to displaying UTF-8 text on dialogs is to manually - in code - set the text using a function like SetDlgItemText() with the UTF-8 string, and not rely on the resource loading of the dialog creation code itself. With the UTF-8 manifest, the -A functions are called, and they'll set the UTF-8 text just fine.

Can also call a -W function explicitly, and convert UTF-8 -> UTF-16 before calling. See UTF-8 text in MFC application that uses Multibyte character set.

See also Microsoft CreateDialogIndirectA macro (winuser.h) which is unusually explicit in relation to this: "All character strings in the dialog box template, such as titles for the dialog box and buttons, must be Unicode strings."

Upvotes: 0

Related Questions