Reputation:
Does the C++ Standard Template Library (STL) provide any method to convert a UTF8 encoded byte buffer into a wstring?
For example:
const unsigned char* szBuf = (const unsigned char*) "d\xC3\xA9j\xC3\xA0 vu";
std::wstring str = method(szBuf); // Should assign "déjà vu" to str
I want to avoid having to implement my own UTF8 conversion code, like this:
const unsigned char* pch = szBuf;
while (*pch != 0)
{
if ((*pch & 0x80) == 0)
{
str += *pch++;
}
else if ((*pch & 0xE0) == 0xC0 && (pch[1] & 0xC0) == 0x80)
{
wchar_t ch = (((*pch & 0x1F) >> 2) << 8) +
((*pch & 0x03) << 6) +
(pch[1] & 0x3F);
str += ch;
pch += 2;
}
else if (...)
{
// other cases omitted
}
}
EDIT: Thanks for your comments and the answer. This code fragment performs the desired conversion:
std::wstring_convert<std::codecvt_utf8<wchar_t>,wchar_t> convert;
str = convert.from_bytes((const char*)szBuf);
Upvotes: 0
Views: 3130
Reputation: 140806
In C++11 you can use std::codecvt_utf8
. If you don't have that, you may be able to persuade iconv
to do what you want; unfortunately, that's not ubiquitous either, not all implementations that have it support UTF-8, and I'm not aware of any way to find out the appropriate thing to pass to iconv_open
to do a conversion from wchar_t
.
If you don't have either of those things, your best bet is a third-party library such as ICU. Surprisingly, Boost does not appear to have anything to the purpose, although I coulda missed it.
Upvotes: 1