Reputation: 2469
I'm trying to convert a multi-byte(UTF) string to Widechar string and mbsnrtowcs is always failing. Here is the input and expected strings:
char* pInputMultiByteString = "A quick brown Fox jumps \xC2\xA9 over the lazy Dog.";
wchar_t* pExpectedWideString = L"A quick brown Fox jumps \x00A9 over the lazy Dog.";
Special character is the copyright symbol.
This conversion works fine when I use Windows MultiByteToWideChar routine, but since that API is not available on linux, I have to use mbsnrtowcs - which is failing. I've tried using other characters as well and it always fails. The only expection is that when I use only an ASCII based Input string then mbsnrtowcs works fine. What am I doing wrong?
Upvotes: 0
Views: 1561
Reputation: 2469
SOLUTION: By default each C program uses the "C" locale, so I had to call setlocale(LCTYPE, "...")
means that it'll use my environment's locale i.e. en_US.utf8
and the conversion worked.
Upvotes: 0
Reputation: 1822
UTF is not a multibyte string (although it is true that unicode characters will be represented using more than 1 byte). A multibyte string is a string that uses a certain codepage to represent characters and some of them will use more than one byte.
Since you are combining ANSI chars and UTF chars you should use UTF8.
So trying to convert UTF to wchar_t
(which on windows is UTF16 and on linux is UTF32) using mbsnrtowcs
just can't be done.
If you use UTF8 you should look into a UNICODE handling library for that. For most tasks I recommend using UTF8-CPP from http://utfcpp.sourceforge.net/
You can read more on UNICODE and UTF8 on Wikipedia.
Upvotes: 1
Reputation: 8075
MultiByteToWideChar has a parameter where you specify the code page, but mbsnrtowcs doesn't. On Linux, have you set LC_CTYPE in your locale to specify UTF-8?
Upvotes: 0