evilmandarine
evilmandarine

Reputation: 4543

C - WinAPI: Why does UTF-8 encoded characters show as Chinese in Listview?

Trying to add text to a listview. Code appears as Chinese characters despite multiple attemps to correctly use encoding. Summary of the code from json file to listview:

// json file (UTF8 without BOM)
"name": "abcdefghijklmnop" // shows as Chinese
// "name": "ササササササササササ" // Japanese does not show correctly neither

// declare structure and array; malloc array later
struct user {
    char* name;
    // char[16] name;
};
struct user *users;

// read from json
cJSON* json_name = cJSON_GetObjectItemCaseSensitive(json_user, "name");

// set name (in a loop)
users[i].name = json_name->valuestring;

// create listview
HWND hWndListView = CreateWindowExW(NULL,
    WC_LISTVIEW,
    L"Test Listview",
    WS_CHILD | WS_VISIBLE | LVS_REPORT | LVS_EDITLABELS,
    // ...

// add column definitions
LVCOLUMN lvc = { 0 };
lvc.mask = LVCF_TEXT | LVCF_SUBITEM | LVCF_WIDTH | LVCF_FMT;
lvc.fmt = LVCFMT_LEFT;

// column example
lvc.iSubItem = 0;
lvc.cx = 100;
lvc.pszText = TEXT("A");
ListView_InsertColumn(hWndListView, 0, &lvc);

// debug: check it's unicode
BOOL what = IsWindowUnicode(hWndListView); // 1

// debug
#ifdef UNICODE
    int i = 0; // this is executed
#endif

What I have tried:

ListView_SetItemText(hWndListView, 2, 1, servers[0].name);         // chinese
ListView_SetItemText(hWndListView, 2, 1, *servers[0].name);        // access violation
ListView_SetItemText(hWndListView, 2, 1, (LPWSTR)servers[0].name); // chinese
ListView_SetItemText(hWndListView, 2, 1, L"%s", servers[0].name);  // %s
ListView_SetItemText(hWndListView, 3, 0, TEXT(servers[0].name));   // does not compile

In VS2019 debugger, adding a watch with the "s8" parameter to see the result as UTF8 correctly displays the text. It is only when added in the control that it does not appear correctly. Note that this works:

ListView_SetItemText(hWndListView, 2, 2, TEXT("ササ"));

Question: what am I missing for my data to be correctly displayed?

Upvotes: 0

Views: 439

Answers (1)

Remy Lebeau
Remy Lebeau

Reputation: 595827

What you are experiencing in commonly known as "Mojibake". It happens when 8-bit character data is misinterpreted as 16-bit Unicode data.

You are creating a Unicode window for your ListView, due to your use of CreateWindowExW(). As such, you must give the ListView proper UTF-16 encoded text using wchar_t characters, but you are instead giving it UTF-8 encoded text via a char* pointer that is merely being type-casted to wchar_t*. The character data itself that is being pointed at is still UTF-8, not UTF-16.

You need to instead actually convert the UTF-8 encoded char data to UTF-16 encoded wchar_t data, such as with MultiByteToWideChar() (or equivalent), eg:

struct user {
    wchar_t* name;
    // wchar_t name[16];
};

...

int len = MultiByteToWideChar(CP_UTF8, 0, json_name->valuestring, -1, NULL, 0);
users[i].name = (wchar_t*) malloc(len * sizeof(wchar_t));
MultiByteToWideChar(CP_UTF8, 0, json_name->valuestring, -1, users[i].name, len);

...

ListView_SetItemText(hWndListView, 2, 1, users[0].name);

...

free(users[i].name);

TEXT("ササ") works because you have UNICODE defined in your project, so TEXT() prefixes its input string literal with the L prefix, ie L"ササ", making it a proper UTF-16 encoded wchar_t string literal.

Upvotes: 2

Related Questions