Reputation: 102286
I've read Stroustrup's Appendix D (particular attention to Locales and Codecvt). Stroustrup does not give a good codecvt and widen example (IMHO). I've been trying to knob turn stuff from the internet with no joy. I've also tried imbue'ing stringstreams without success.
Would anyone be able to show (and explain) the code to go from a UTF-8 to a UTF-16 (or UTF-32) encoding? NOTE: I do not know the size of the input/output string in advance, so I expect the solution should use reserve
and a back_inserter
. Please don't use out.resize(in.length()*2)
.
When finished, it would be great if the code actually worked (its amazing how much broken code is out there). Please make sure the following 'round trips'. The bytes below are the Han character for 'bone' in UTF-8 and UTF-{16|32}.
const std::string n("\xe9\xaa\xa8");
const std::wstring w = L"\u9aa8";
My apologies for a basic question. On Windows, I use the Win32 API and don't have these problems moving between encodings.
Upvotes: 2
Views: 2193
Reputation: 5989
Just use UTF8-CPP :
std::wstring conversion;
utf8::utf8to16(utf8_str.begin(), utf8_str.end() , back_inserter(conversion));
Caveat: this will only work where wchar_t is 2-bytes long (windows).
For a portable solution you could do :
std::vector<unsigned short> utf16line; // uint16_t if you can
utf8::utf8to16(utf8_line.begin(), utf8_line.end(), back_inserter(utf16line));
But then you're losing the string support. Hopefully, we'll get char16_t soon enough.
Upvotes: 4
Reputation:
It seems pretty obvious that he was smoking weed. As for the codepage conversions, look no further than iconv!
Upvotes: 2