Reputation: 15541
I have a windows application where string types are WCHAR*
. I need to convert this into char*
for passing into a C API. I am using MultiByteToWideChar
and WideCharToMultiByte
functions to perform the conversion.
But for some reason, the conversion is not proper. I am seeing lot of gibberish in the output. Following code is a modified version found in this stackoverflow answer.
WCHAR* convert_to_wstring(const char* str)
{
int size_needed = MultiByteToWideChar(CP_UTF8, 0, str, (int)strlen(str), NULL, 0);
WCHAR* wstrTo = (WCHAR*)malloc(size_needed);
MultiByteToWideChar(CP_UTF8, 0, &str[0], (int)strlen(str), wstrTo, size_needed);
return wstrTo;
}
char* convert_from_wstring(const WCHAR* wstr)
{
int size_needed = WideCharToMultiByte(CP_UTF8, 0, wstr, (int)wcslen(wstr), NULL, 0, NULL, NULL);
char* strTo = (char*)malloc(size_needed);
WideCharToMultiByte(CP_UTF8, 0, wstr, (int)wcslen(wstr), strTo, size_needed, NULL, NULL);
return strTo;
}
int main()
{
const WCHAR* wText = L"Wide string";
const char* text = convert_from_wstring(wText);
std::cout << text << "\n";
std::cout << convert_to_wstring("Multibyte string") << "\n";
return 0;
}
Upvotes: 2
Views: 9454
Reputation: 598329
Your conversion functions are buggy.
The return value of MultiByteToWideChar()
is a number of wide characters, not a number of bytes like you are currently treating it. You need to multiple the value by sizeof(WCHAR)
when calling malloc()
.
You are also not taking into account that the return value DOES NOT include space for a null terminator, because you are not passing -1
in the cbMultiByte
parameter. Read the MultiByteToWideChar()
documentation:
cbMultiByte
[in]
Size, in bytes, of the string indicated by thelpMultiByteStr
parameter. Alternatively, this parameter can be set to -1 if the string is null-terminated. Note that, ifcbMultiByte
is 0, the function fails.If this parameter is -1, the function processes the entire input string, including the terminating null character. Therefore, the resulting Unicode string has a terminating null character, and the length returned by the function includes this character.
If this parameter is set to a positive integer, the function processes exactly the specified number of bytes. If the provided size does not include a terminating null character, the resulting Unicode string is not null-terminated, and the returned length does not include this character.
...
Return value
Returns the number of characters written to the buffer indicated by
lpWideCharStr
if successful. If the function succeeds andcchWideChar
is 0, the return value is the required size, in characters, for the buffer indicated bylpWideCharStr
.
You are not null-terminating your output string.
The same goes with your convert_from_wstring()
function. Read the WideCharToMultiByte()
documentation:
cchWideChar
[in]
Size, in characters, of the string indicated bylpWideCharStr
. Alternatively, this parameter can be set to -1 if the string is null-terminated. IfcchWideChar
is set to 0, the function fails.If this parameter is -1, the function processes the entire input string, including the terminating null character. Therefore, the resulting character string has a terminating null character, and the length returned by the function includes this character.
If this parameter is set to a positive integer, the function processes exactly the specified number of characters. If the provided size does not include a terminating null character, the resulting character string is not null-terminated, and the returned length does not include this character.
...
Return value
Returns the number of bytes written to the buffer pointed to by
lpMultiByteStr
if successful. If the function succeeds andcbMultiByte
is 0, the return value is the required size, in bytes, for the buffer indicated bylpMultiByteStr
.
That being said, your main()
code is leaking the allocated strings. Since they are allocated with malloc()
, you need to deallocate them with free()
when you are done using them:
Also, you cannot pass a WCHAR*
string to std::cout
. Well, you can, but it has no operator<<
for wide string input, but it does have an operator<<
for void*
input, so it will just end up outputting the memory address that the WCHAR*
is pointing at, not the actual characters. If you want to output wide strings, use std::wcout
instead.
Try something more like this:
WCHAR* convert_to_wstring(const char* str)
{
int str_len = (int) strlen(str);
int num_chars = MultiByteToWideChar(CP_UTF8, 0, str, str_len, NULL, 0);
WCHAR* wstrTo = (WCHAR*) malloc((num_chars + 1) * sizeof(WCHAR));
if (wstrTo)
{
MultiByteToWideChar(CP_UTF8, 0, str, str_len, wstrTo, num_chars);
wstrTo[num_chars] = L'\0';
}
return wstrTo;
}
CHAR* convert_from_wstring(const WCHAR* wstr)
{
int wstr_len = (int) wcslen(wstr);
int num_chars = WideCharToMultiByte(CP_UTF8, 0, wstr, wstr_len, NULL, 0, NULL, NULL);
CHAR* strTo = (CHAR*) malloc((num_chars + 1) * sizeof(CHAR));
if (strTo)
{
WideCharToMultiByte(CP_UTF8, 0, wstr, wstr_len, strTo, num_chars, NULL, NULL);
strTo[num_chars] = '\0';
}
return strTo;
}
int main()
{
const WCHAR* wText = L"Wide string";
const char* text = convert_from_wstring(wText);
std::cout << text << "\n";
free(text);
const WCHAR *wtext = convert_to_wstring("Multibyte string");
std::wcout << wtext << "\n";
free(wtext);
return 0;
}
That being said, you really should be using std::string
and std::wstring
instead of char*
and wchar_t*
for better memory management:
std::wstring convert_to_wstring(const std::string &str)
{
int num_chars = MultiByteToWideChar(CP_UTF8, 0, str.c_str(), str.length(), NULL, 0);
std::wstring wstrTo;
if (num_chars)
{
wstrTo.resize(num_chars);
MultiByteToWideChar(CP_UTF8, 0, str.c_str(), str.length(), &wstrTo[0], num_chars);
}
return wstrTo;
}
std::string convert_from_wstring(const std::wstring &wstr)
{
int num_chars = WideCharToMultiByte(CP_UTF8, 0, wstr.c_str(), wstr.length(), NULL, 0, NULL, NULL);
std::string strTo;
if (num_chars > 0)
{
strTo.resize(num_chars);
WideCharToMultiByte(CP_UTF8, 0, wstr.c_str(), wstr.length(), &strTo[0], num_chars, NULL, NULL);
}
return strTo;
}
int main()
{
const WCHAR* wText = L"Wide string";
const std::string text = convert_from_wstring(wText);
std::cout << text << "\n";
const std::wstring wtext = convert_to_wstring("Multibyte string");
std::wcout << wtext << "\n";
return 0;
}
If you are using C++11 or later, have a look at the std::wstring_convert
class for converting between UTF strings, eg:
std::wstring convert_to_wstring(const std::string &str)
{
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>, wchar_t> conv;
return conv.from_bytes(str);
}
std::string convert_from_wstring(const std::wstring &wstr)
{
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>, wchar_t> conv;
return conv.to_bytes(wstr);
}
If you need to interact with other code that is based on char*
/wchar_t*
, std::string
as a constructor for accepting char*
input and a c_str()
method that can be used for char*
output, and the same goes for std::wstring
and wchar_t*
.
Upvotes: 12