miko3k
miko3k

Reputation: 67

how does one convert std::u16string -> std::wstring using <codecvt>?

I found a bunch of questions on a similar topic, but nothing regarding wide to wide conversion with <codecvt>, which is supposed to be the correct choice in the modern code.

The std::codecvt_utf16<wchar_t> seems to be a logical choice to perform the conversion.

However std::wstring_convert seem to expect std::string at one end. The methods from_bytes and to_bytes emphasize this purpose.

I mean, the best solution so far is something like std::copy, which might work for my specific case, but seems kinda low tech and probably not too correct either.

I have a string feeling that I am missing something rather obvious.

Cheers.

Upvotes: 1

Views: 1852

Answers (2)

Jorg K
Jorg K

Reputation: 91

you cannot convert directly from std::u16string to std::wstring (and vice versa) with them. You will have to convert to an intermediate UTF-8 std::string first, and then convert that afterwards

This doesn't appear to be the case as clang: converting const char16_t* (UTF-16) to wstring (UCS-4) shows:

u16string s = u"hello";
wstring_convert<codecvt_utf16<wchar_t, 0x10ffff, little_endian>,
                 wchar_t> conv;
wstring ws = conv.from_bytes(
                 reinterpret_cast<const char*> (&s[0]),
                 reinterpret_cast<const char*> (&s[0] + s.size()));

Upvotes: 0

Remy Lebeau
Remy Lebeau

Reputation: 595369

The std::wstring_convert and std::codecvt... classes are deprecated in C++17 onward. There is no longer a standard way to convert between the various string classes.

If your compiler still supports the classes, you can certainly use them. However, you cannot convert directly from std::u16string to std::wstring (and vice versa) with them. You will have to convert to an intermediate UTF-8 std::string first, and then convert that afterwards, eg:

std::u16string utf16 = ...;

std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t> utf16conv;
std::string utf8 = utf16conv.to_bytes(utf16);

std::wstring_convert<std::codecvt_utf8<wchar_t>, wchar_t> wconv;
std::wstring wstr = wconv.from_bytes(utf8);

Just know that this approach will break when the classes are eventually dropped from the standard library.

Using std::copy() (or simply the various std::wstring data construct/assign methods) will work only on Windows, where wchar_t and char16_t are both 16-bit in size representing UTF-16:

std::u16string utf16 = ...;
std::wstring wstr;

#ifdef _WIN32
wstr.reserve(utf16.size());
std::copy(utf16.begin(), utf16.end(), std::back_inserter(wstr));
/*
or: wstr = std::wstring(utf16.begin(), utf16.end());
or: wstr.assign(utf16.begin(), utf16.end());
or: wstr = std::wstring(reinterpret_cast<const wchar_t*>(utf16.c_str()), utf16.size());
or: wstr.assign(reinterpret_cast<const wchar_t*>(utf16.c_str()), utf16.size());
*/
#else
// do something else ...
#endif

But, on other platforms, where wchar_t is 32-bit in size representing UTF-32, you will need to actually convert the data, using the code shown above, or a platform-specific API or 3rd party Unicode library that can do the data conversion, such as libiconv, ICU. etc.

Upvotes: 2

Related Questions