Reputation: 151
If I want to convert a piece of string to UTF-16, say char * xmlbuffer
, do I have to convert the type to wchar_t *
before encoding to UTF-16? And is char*
type reqired before encoding to UTF-8?
How is wchar_t
, char
related to UTF-8 or UTF-16 or UTF-32 or other transformation format?
Thanks in advance for help!
Upvotes: 6
Views: 8852
Reputation: 95335
iconv
is a POSIX function that can take care of the intermediate encoding step. You can use iconv_open
to specify that you have UTF-8 input and that you want UTF-16 output. Then, using the handle returned from iconv_open
, you can use iconv
(specifying your input buffer and output buffer). When you are done you must call iconv_close
on the handle returned from iconv_open
to free resources etc.
You will have to peruse your system's documentation about what encodings are supported by iconv
and their naming scheme (i.e. what to provide iconv_open
). For example, iconv
on some systems expect "utf-8"
and others it may expect "UTF8"
etc.
Windows does not provide a version of iconv, and instead provides it's own UTF formatting functions: MultiByteToWideChar and WideCharToMultiByte.
//UTF8 to UTF16
std::string input = ...
int utf16len = MultiByteToWideChar(CP_UTF8, 0, input.c_str(), input.size(),
NULL, 0);
std::wstring output(utf16len);
MultiByteToWideChar(CP_UTF8, 0, input.c_str(), input.size(),
&output[0], output.size());
//UTF16 to UTF8
std::wstring input = ...
int utf8len = WideCharToMultiByte(CP_UTF8, 0, input.c_str(), input.size(),
NULL, 0, NULL, NULL);
std::string output(utf8len);
WideCharToMultiByte(CP_UTF8, 0, input.c_str(), input.size(),
&output[0], output.size(), NULL, NULL);
Upvotes: 5
Reputation: 437376
No, you don't have to change data types.
About wchar_t
: the standard says that
Type wchar_t is a distinct type whose values can represent distinct codes for all members of the largest extended character set specified among the supported locales.
Unfortunately, it does not say what encoding wchar_t
is supposed to have; this is implementation-dependent. So for example given
auto s = L"foo";
you can make absolutely no assumption about what the value of the expression *s
is.
However, you can use an std::string
as an opaque sequence of bytes that represent text in any transformation format of your choice without issue. Just don't perform standard library string-related operations on it.
Upvotes: 5
Reputation: 164
The size of wchar_t
is compiler dependent, so its relation to the various unicode formats will vary.
Upvotes: 1