user128300
user128300

Reputation:

Convert UTF8 encoded byte buffer to wstring?

Does the C++ Standard Template Library (STL) provide any method to convert a UTF8 encoded byte buffer into a wstring?

For example:

const unsigned char* szBuf = (const unsigned char*) "d\xC3\xA9j\xC3\xA0 vu";
std::wstring str = method(szBuf); // Should assign "déjà vu" to str

I want to avoid having to implement my own UTF8 conversion code, like this:

const unsigned char* pch = szBuf;    
while (*pch != 0)
{
    if ((*pch & 0x80) == 0)
    {
    str += *pch++;
    }
    else if ((*pch & 0xE0) == 0xC0 && (pch[1] & 0xC0) == 0x80)
    {
        wchar_t ch = (((*pch & 0x1F) >> 2) << 8) +
            ((*pch & 0x03) << 6) +
            (pch[1] & 0x3F);
        str += ch;
        pch += 2;
    }
    else if (...)
    {
        // other cases omitted
    }
}

EDIT: Thanks for your comments and the answer. This code fragment performs the desired conversion:

std::wstring_convert<std::codecvt_utf8<wchar_t>,wchar_t> convert;
str = convert.from_bytes((const char*)szBuf);

Upvotes: 0

Views: 3130

Answers (1)

zwol
zwol

Reputation: 140806

In C++11 you can use std::codecvt_utf8. If you don't have that, you may be able to persuade iconv to do what you want; unfortunately, that's not ubiquitous either, not all implementations that have it support UTF-8, and I'm not aware of any way to find out the appropriate thing to pass to iconv_open to do a conversion from wchar_t.

If you don't have either of those things, your best bet is a third-party library such as ICU. Surprisingly, Boost does not appear to have anything to the purpose, although I coulda missed it.

Upvotes: 1

Related Questions