jww
jww

Reputation: 102286

Convert string to wstring, encoding issues

I've read Stroustrup's Appendix D (particular attention to Locales and Codecvt). Stroustrup does not give a good codecvt and widen example (IMHO). I've been trying to knob turn stuff from the internet with no joy. I've also tried imbue'ing stringstreams without success.

Would anyone be able to show (and explain) the code to go from a UTF-8 to a UTF-16 (or UTF-32) encoding? NOTE: I do not know the size of the input/output string in advance, so I expect the solution should use reserve and a back_inserter. Please don't use out.resize(in.length()*2).

When finished, it would be great if the code actually worked (its amazing how much broken code is out there). Please make sure the following 'round trips'. The bytes below are the Han character for 'bone' in UTF-8 and UTF-{16|32}.

const std::string n("\xe9\xaa\xa8");
const std::wstring w = L"\u9aa8";

My apologies for a basic question. On Windows, I use the Win32 API and don't have these problems moving between encodings.

Upvotes: 2

Views: 2193

Answers (2)

anno
anno

Reputation: 5989

Just use UTF8-CPP :

std::wstring conversion; 
utf8::utf8to16(utf8_str.begin(), utf8_str.end() , back_inserter(conversion));

Caveat: this will only work where wchar_t is 2-bytes long (windows).

For a portable solution you could do :

std::vector<unsigned short> utf16line; // uint16_t if you can
utf8::utf8to16(utf8_line.begin(), utf8_line.end(), back_inserter(utf16line));

But then you're losing the string support. Hopefully, we'll get char16_t soon enough.

Upvotes: 4

user405725
user405725

Reputation:

It seems pretty obvious that he was smoking weed. As for the codepage conversions, look no further than iconv!

Upvotes: 2

Related Questions