user1088794
user1088794

Reputation: 93

C++ string encoding UTF8 / unicode

I am trying to be able to send character "Т" (not a normal capital t, unicode decimal value 1058) from C++ to VB

However, with this method below Message is returned to VB and it appears as "Т", which is the above character encoded in ANSI.

#if defined(_MSC_VER) && _MSC_VER > 1310
# define utf8(str)  ConvertToUTF8(L##str)
const char * ConvertToUTF8(const wchar_t * pStr) {
    static char szBuf[1024];
    WideCharToMultiByte(CP_UTF8, 0, pStr, -1, szBuf, sizeof(szBuf), NULL, NULL);
    return szBuf;
}
#else
# define utf8(str)  str
#endif


BSTR _stdcall chatTest()
{
    BSTR Message;
    CString temp("temp test");
    temp+=utf8("\u0422");
    int len = temp.GetLength();
    Message = SysAllocStringByteLen ((LPCTSTR)temp, len+1 );
    return Message;
}

If I just do temp+=("\u0422"); without the utf8 function. It sends the data as "?" and its actually a question mark (sometimes unicode characters show up as question marks in VB, but still have the correct unicode decimal value.. this is not the case here... it changes it to a question mark.

In VB if I output the String variable that has data from Message when it is "Т" to a text file it appears as the "Т".

So as far as I can tell its in UTF8 in C++, then somehow gets converted to ANSI in VB (or before its sent?), and then when outputted to a file its changed back to UTF8?

I just need to keep the "Т" intact when sending from C++ to VB. I know VB strings can hold that character because from another source within VB I am able to store it (it appears as a "?", but has the proper unicode decimal value).

Any help is greatly appreciated.

Thanks

Upvotes: 1

Views: 2426

Answers (1)

Mark Ransom
Mark Ransom

Reputation: 308091

A BSTR is not UTF-8, it's UTF-16 which is what you get with the L"" prefix. Take out the UTF-8 conversion and use CStringW. And use LPCWSTR instead of LPCTSTR.

Upvotes: 1

Related Questions