Reputation: 67
I found a bunch of questions on a similar topic, but nothing regarding wide to wide conversion with <codecvt>
, which is supposed to be the correct choice in the modern code.
The std::codecvt_utf16<wchar_t>
seems to be a logical choice to perform the conversion.
However std::wstring_convert
seem to expect std::string
at one end. The methods from_bytes
and to_bytes
emphasize this purpose.
I mean, the best solution so far is something like std::copy
, which might work for my specific case, but seems kinda low tech and probably not too correct either.
I have a string feeling that I am missing something rather obvious.
Cheers.
Upvotes: 1
Views: 1852
Reputation: 91
you cannot convert directly from std::u16string to std::wstring (and vice versa) with them. You will have to convert to an intermediate UTF-8 std::string first, and then convert that afterwards
This doesn't appear to be the case as clang: converting const char16_t* (UTF-16) to wstring (UCS-4) shows:
u16string s = u"hello";
wstring_convert<codecvt_utf16<wchar_t, 0x10ffff, little_endian>,
wchar_t> conv;
wstring ws = conv.from_bytes(
reinterpret_cast<const char*> (&s[0]),
reinterpret_cast<const char*> (&s[0] + s.size()));
Upvotes: 0
Reputation: 595369
The std::wstring_convert
and std::codecvt...
classes are deprecated in C++17 onward. There is no longer a standard way to convert between the various string classes.
If your compiler still supports the classes, you can certainly use them. However, you cannot convert directly from std::u16string
to std::wstring
(and vice versa) with them. You will have to convert to an intermediate UTF-8 std::string
first, and then convert that afterwards, eg:
std::u16string utf16 = ...;
std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t> utf16conv;
std::string utf8 = utf16conv.to_bytes(utf16);
std::wstring_convert<std::codecvt_utf8<wchar_t>, wchar_t> wconv;
std::wstring wstr = wconv.from_bytes(utf8);
Just know that this approach will break when the classes are eventually dropped from the standard library.
Using std::copy()
(or simply the various std::wstring
data construct/assign methods) will work only on Windows, where wchar_t
and char16_t
are both 16-bit in size representing UTF-16:
std::u16string utf16 = ...;
std::wstring wstr;
#ifdef _WIN32
wstr.reserve(utf16.size());
std::copy(utf16.begin(), utf16.end(), std::back_inserter(wstr));
/*
or: wstr = std::wstring(utf16.begin(), utf16.end());
or: wstr.assign(utf16.begin(), utf16.end());
or: wstr = std::wstring(reinterpret_cast<const wchar_t*>(utf16.c_str()), utf16.size());
or: wstr.assign(reinterpret_cast<const wchar_t*>(utf16.c_str()), utf16.size());
*/
#else
// do something else ...
#endif
But, on other platforms, where wchar_t
is 32-bit in size representing UTF-32, you will need to actually convert the data, using the code shown above, or a platform-specific API or 3rd party Unicode library that can do the data conversion, such as libiconv
, ICU
. etc.
Upvotes: 2