nochenon
nochenon

Reputation: 346

mbstowcs() gives incorrect results in Windows

I am using mbstowcs() to convert a UTF-8 encoded char* string to wchar_t*, and the latter will be fed into _wfopen(). However, I always get a NULL pointer from _wfopen() and I have found the problem is from the result of mbstowcs().

I prepared the following example and used printf for debugging...

size_t out_size;
int requiredSize;
wchar_t *wc_filename;
char *utf8_filename = "C:/Users/xxxxxxxx/Desktop/\xce\xb1\xce\xb2\xce\xb3.stdf";
wchar_t *expected_output = L"C:/Users/xxxxxxxx/Desktop/αβγ.stdf";

printf("input: %s, length: %d\n", utf8_filename, strlen(utf8_filename));
printf("correct out length is %d\n", wcslen(expected_output));

// convertion start here
setlocale(LC_ALL, "C.UTF-8");

requiredSize = mbstowcs(NULL, utf8_filename, 0);
wc_filename = (wchar_t*)malloc( (requiredSize+1) * sizeof(wchar_t));

printf("requiredsize: %d\n", requiredSize);

if (!wc_filename) {
    // allocation fail
    free(wc_filename);
    return -1;
}
out_size = mbstowcs(wc_filename, utf8_filename, requiredSize + 1);
if (out_size == (size_t)(-1)) {
    // convertion fail
    free(wc_filename);
    return -1;
}
printf("out_size: %d, wchar name: %ls\n", out_size, wc_filename);

if (wcscmp (wc_filename, expected_output) != 0) {
    printf("converted result is not correct\n");
}
free(wc_filename);

And the console output is:

input: C:/Users/xxxxxxxx/Desktop/αβγ.stdf, length: 37
correct out length is 34
requiredsize: 37
out_size: 37, wchar name: C:/Users/xxxxxxxx/Desktop/αβγ.stdf
converted result is not correct

I just don't know why expected_output and wc_filename have the same content but the length is different? What did I do wrong here?

Upvotes: 1

Views: 305

Answers (2)

BugMeNot114514
BugMeNot114514

Reputation: 69

Universal CRT supports UTF-8, but MSVCRT.DLL is not. When using MINGW, you need to link to UCRT.

Upvotes: 0

Adrian Mole
Adrian Mole

Reputation: 51815

The problem appears to be in your choice of locale name. Replacing the following:

setlocale(LC_ALL, "C.UTF-8");

with this:

setlocale(LC_ALL, "en_US.UTF-8");

fixes the issue on my system (Windows 10, MSVC, 64-bit build) – at least, the out_size and requiredSize are both 34 and the "converted result is not correct\n" message doesn't show. Using "en_GB.UTF-8" also worked.

I'm not sure if the C Standard actually defines what locale names are, but this question/answer may be helpful: Valid Locale Names.


Note: As mentioned in the comment by Mgetz, using setlocale(LC_ALL, ".UTF-8"); also works – I guess that would be the minimal and most portable locale name to use.

Second note: You can check if the setlocale call succeeded by comparing its return value to NULL. Using your original local name will give an error message if you use the following code (but not if you remove the leading "C"):

    if (setlocale(LC_ALL, "C.UTF-8") == NULL) {
        printf("Error setting locale!\n");
    }

Upvotes: 2

Related Questions