Reputation: 5233
I'm performing wchar_t* to UTF-8 conversion like following:
char* DupString(wchar_t* t)
{
if(!t) return strdup("");
USES_CONVERSION;
_acp = CP_UTF8;
return strdup(W2A(t));
}
Normally it works fine, but now I've located one Chinese text "主体" - with which conversion does not work correctly.
Macro itself is defined like this:
#define W2A(lpw) (\
((_lpw = lpw) == NULL) ? NULL : (\
(_convert = (lstrlenW(_lpw)+1), \
(_convert>INT_MAX/2) ? NULL : \
ATLW2AHELPER((LPSTR) alloca(_convert*sizeof(WCHAR)), _lpw, _convert*sizeof(WCHAR), _acp))))
In my case _convert = 2 + 1 = 3. When passed to function call 3 * sizeof(WCHAR) = 6.
In atlconv.h / AtlW2AHelper - it hits into WideCharToMultiByte and ret == 0.
_Ret_opt_z_cap_(nChars) inline LPSTR WINAPI AtlW2AHelper(
_Out_opt_z_cap_(nChars) LPSTR lpa,
_In_opt_z_ LPCWSTR lpw,
_In_ int nChars,
_In_ UINT acp) throw()
{
ATLASSERT(lpw != NULL);
ATLASSERT(lpa != NULL);
if (lpa == NULL || lpw == NULL)
return NULL;
// verify that no illegal character present
// since lpa was allocated based on the size of lpw
// don't worry about the number of chars
*lpa = '\0';
int ret = WideCharToMultiByte(acp, 0, lpw, -1, lpa, nChars, NULL, NULL);
if(ret == 0)
{
ATLASSERT(FALSE);
return NULL;
}
return lpa;
}
@err in Watch windows shows error code 122 = ERROR_INSUFFICIENT_BUFFER.
I've tried to increase buffer by one byte - nChars = 7 - then conversion does succeeds. Buffer is filled with 6 bytes + 1 ascii zero termination - so 7 bytes filled.
Is this a bug of W2A macro - ascii zero is not taken into account ?
Has anyone seen similar problem ?
As a platform I'm using visual studio 2010, not sure if problem persists in other visual studio's as well.
In some header files this issue seems to be fixed - for example in here:
https://github.com/kxproject/kx-audio-driver/blob/master/h/gui/kDefs.h
But it's applicable to some 3-rd party project, not Visual studio itself.
Upvotes: 1
Views: 3457
Reputation: 5233
Copy paste from Microsoft forum, from here:
Have you considered using the improved ATL7 macro? https://msdn.microsoft.com/en-us/library/87zae4a3.aspx#atl70stringconversionclassesmacros
CW2A pA( pW, CP_UTF8 );
This seems to assume 4 bytes max per Unicode character, rather than 2 that the old one does.
This seems to be slightly better usage of string, because CW2A's destructor will release conversion buffer.
wchar_t* pStr = NULL;
{
CW2A pA( pW, CP_UTF8 );
pStr = pA;
// pStr is valid
}
// pStr is invalid
Upvotes: 1
Reputation: 69642
W2A
mistakenly assumes that a buffer of two bytes per character is sufficient for the conversion. Your string expands into a UTF-8 string of seven bytes including terminating zero. WideCharToMultiByte
fails on insufficient buffer - this is what you already found.
It looks like a bug which you can fix yourself in ATL source (Microsoft will not update VS 2010 and I suppose it might be late to update even 2015 already) in atlconv.h:
#define W2A(lpw) (\
((_lpw = lpw) == NULL) ? NULL : (\
(_convert = (static_cast<int>(wcslen(_lpw))+1), \
(_convert>INT_MAX/2) ? NULL : \
ATLW2AHELPER((LPSTR) alloca(_convert*sizeof(WCHAR)), _lpw, _convert*4, _acp)))) //sizeof(WCHAR), _acp))))
Or you can use newer CW2A
conversion macros which already allocate larger buffers (4 bytes per character, see CW2AEX::Init
):
static const LPCWSTR g_psz = L"主体";
LPCSTR psz = _strdup(CW2A(g_psz, CP_UTF8));
Upvotes: 2