Reputation:
I have a variant bstr that was pulled from MSXML DOM, so it is in UTF-16. I'm trying to figure out what default encoding occurs with this conversion:
VARIANT vtNodeValue;
pNode->get_nodeValue(&vtNodeValue);
string strValue = (char*)_bstr_t(vtNodeValue);
From testing, I believe that the default encoding is either Windows-1252 or Ascii, but am not sure.
Btw, this is the chunk of code that I am fixing and converting the variant to a wstring and going to a multi-byte encoding with a call to WideCharToMultiByte.
Thanks!
Upvotes: 8
Views: 4405
Reputation: 23168
The operator char*
method calls _com_util::ConvertBSTRToString()
. The documentation is pretty unhelpful, but I assume it uses the current locale settings to do the conversion.
Update:
Internally, _com_util::ConvertBSTRToString()
calls WideCharToMultiByte
, passing zero for all the code-page and default character parameters. This is the same as passing CP_ACP
, which means to use the system's current ANSI code-page setting (not the current thread setting).
If you want to avoid losing data, you should probably call WideCharToMultiByte
directly and use CP_UTF8
. You can still treat the string as a null-terminated single-byte string and use std::string
, you just can't treat bytes as characters.
Upvotes: 10
Reputation: 181998
std::string
by itself doesn't specify/contain any encoding. It is merely a sequence of bytes. The same holds for std::wstring
, which is merely a sequence of wchar_t
s (double-byte words, on Win32).
By converting _bstr_t
to a char*
through its operator char*, you'll simply get a pointer to the raw data. According to MSDN, this data consists of wide characters, that is, wchar_t
s, which represent UTF-16.
I'm surprised that it actually works to construct a std::string
from this; you should not get past the first zero byte (which occurs soon, if your original string is English).
But since wstring
is a string of wchar_t
, you should be able to construct one directly from the _bstr_t
, as follows:
_bstr_t tmp(vtNodeValue);
wstring strValue((wchar_t*)tmp, tmp.length());
(I'm not sure about length
; is it the number of bytes or the number of characters?) Then, you'll have a wstring
that's encoded in UTF-16 on which you can call WideCharToMultiByte
.
Upvotes: 0