Reputation: 346
I am using mbstowcs()
to convert a UTF-8 encoded char*
string to wchar_t*
, and the latter will be fed into _wfopen()
. However, I always get a NULL
pointer from _wfopen()
and I have found the problem is from the result of mbstowcs()
.
I prepared the following example and used printf
for debugging...
size_t out_size;
int requiredSize;
wchar_t *wc_filename;
char *utf8_filename = "C:/Users/xxxxxxxx/Desktop/\xce\xb1\xce\xb2\xce\xb3.stdf";
wchar_t *expected_output = L"C:/Users/xxxxxxxx/Desktop/αβγ.stdf";
printf("input: %s, length: %d\n", utf8_filename, strlen(utf8_filename));
printf("correct out length is %d\n", wcslen(expected_output));
// convertion start here
setlocale(LC_ALL, "C.UTF-8");
requiredSize = mbstowcs(NULL, utf8_filename, 0);
wc_filename = (wchar_t*)malloc( (requiredSize+1) * sizeof(wchar_t));
printf("requiredsize: %d\n", requiredSize);
if (!wc_filename) {
// allocation fail
free(wc_filename);
return -1;
}
out_size = mbstowcs(wc_filename, utf8_filename, requiredSize + 1);
if (out_size == (size_t)(-1)) {
// convertion fail
free(wc_filename);
return -1;
}
printf("out_size: %d, wchar name: %ls\n", out_size, wc_filename);
if (wcscmp (wc_filename, expected_output) != 0) {
printf("converted result is not correct\n");
}
free(wc_filename);
And the console output is:
input: C:/Users/xxxxxxxx/Desktop/αβγ.stdf, length: 37
correct out length is 34
requiredsize: 37
out_size: 37, wchar name: C:/Users/xxxxxxxx/Desktop/αβγ.stdf
converted result is not correct
I just don't know why expected_output
and wc_filename
have the same content but the length is different? What did I do wrong here?
Upvotes: 1
Views: 305
Reputation: 69
Universal CRT supports UTF-8, but MSVCRT.DLL is not. When using MINGW, you need to link to UCRT.
Upvotes: 0
Reputation: 51815
The problem appears to be in your choice of locale name. Replacing the following:
setlocale(LC_ALL, "C.UTF-8");
with this:
setlocale(LC_ALL, "en_US.UTF-8");
fixes the issue on my system (Windows 10, MSVC, 64-bit build) – at least, the out_size
and requiredSize
are both 34
and the "converted result is not correct\n"
message doesn't show. Using "en_GB.UTF-8" also worked.
I'm not sure if the C Standard actually defines what locale names are, but this question/answer may be helpful: Valid Locale Names.
Note: As mentioned in the comment by Mgetz, using setlocale(LC_ALL, ".UTF-8");
also works – I guess that would be the minimal and most portable locale name to use.
Second note: You can check if the setlocale
call succeeded by comparing its return value to NULL
. Using your original local name will give an error message if you use the following code (but not if you remove the leading "C"):
if (setlocale(LC_ALL, "C.UTF-8") == NULL) {
printf("Error setting locale!\n");
}
Upvotes: 2