Navaneeth K N
Navaneeth K N

Reputation: 15541

How to convert between widecharacter and multi byte character string in windows?

I have a windows application where string types are WCHAR*. I need to convert this into char* for passing into a C API. I am using MultiByteToWideChar and WideCharToMultiByte functions to perform the conversion.

But for some reason, the conversion is not proper. I am seeing lot of gibberish in the output. Following code is a modified version found in this stackoverflow answer.

WCHAR* convert_to_wstring(const char* str)
{
    int size_needed = MultiByteToWideChar(CP_UTF8, 0, str, (int)strlen(str), NULL, 0);
    WCHAR* wstrTo = (WCHAR*)malloc(size_needed);
    MultiByteToWideChar(CP_UTF8, 0, &str[0], (int)strlen(str), wstrTo, size_needed);
    return wstrTo;
}

char* convert_from_wstring(const WCHAR* wstr)
{
    int size_needed = WideCharToMultiByte(CP_UTF8, 0, wstr, (int)wcslen(wstr), NULL, 0, NULL, NULL);
    char* strTo = (char*)malloc(size_needed);
    WideCharToMultiByte(CP_UTF8, 0, wstr, (int)wcslen(wstr), strTo, size_needed, NULL, NULL);
    return strTo;
}

int main()
{
    const WCHAR* wText = L"Wide string";
    const char* text = convert_from_wstring(wText);
    std::cout << text << "\n";
    std::cout << convert_to_wstring("Multibyte string") << "\n";
    return 0;
}

Upvotes: 2

Views: 9454

Answers (1)

Remy Lebeau
Remy Lebeau

Reputation: 598329

Your conversion functions are buggy.

The return value of MultiByteToWideChar() is a number of wide characters, not a number of bytes like you are currently treating it. You need to multiple the value by sizeof(WCHAR) when calling malloc().

You are also not taking into account that the return value DOES NOT include space for a null terminator, because you are not passing -1 in the cbMultiByte parameter. Read the MultiByteToWideChar() documentation:

cbMultiByte [in]
Size, in bytes, of the string indicated by the lpMultiByteStr parameter. Alternatively, this parameter can be set to -1 if the string is null-terminated. Note that, if cbMultiByte is 0, the function fails.

If this parameter is -1, the function processes the entire input string, including the terminating null character. Therefore, the resulting Unicode string has a terminating null character, and the length returned by the function includes this character.

If this parameter is set to a positive integer, the function processes exactly the specified number of bytes. If the provided size does not include a terminating null character, the resulting Unicode string is not null-terminated, and the returned length does not include this character.

...

Return value

Returns the number of characters written to the buffer indicated by lpWideCharStr if successful. If the function succeeds and cchWideChar is 0, the return value is the required size, in characters, for the buffer indicated by lpWideCharStr.

You are not null-terminating your output string.

The same goes with your convert_from_wstring() function. Read the WideCharToMultiByte() documentation:

cchWideChar [in]
Size, in characters, of the string indicated by lpWideCharStr. Alternatively, this parameter can be set to -1 if the string is null-terminated. If cchWideChar is set to 0, the function fails.

If this parameter is -1, the function processes the entire input string, including the terminating null character. Therefore, the resulting character string has a terminating null character, and the length returned by the function includes this character.

If this parameter is set to a positive integer, the function processes exactly the specified number of characters. If the provided size does not include a terminating null character, the resulting character string is not null-terminated, and the returned length does not include this character.

...

Return value

Returns the number of bytes written to the buffer pointed to by lpMultiByteStr if successful. If the function succeeds and cbMultiByte is 0, the return value is the required size, in bytes, for the buffer indicated by lpMultiByteStr.

That being said, your main() code is leaking the allocated strings. Since they are allocated with malloc(), you need to deallocate them with free() when you are done using them:

Also, you cannot pass a WCHAR* string to std::cout. Well, you can, but it has no operator<< for wide string input, but it does have an operator<< for void* input, so it will just end up outputting the memory address that the WCHAR* is pointing at, not the actual characters. If you want to output wide strings, use std::wcout instead.

Try something more like this:

WCHAR* convert_to_wstring(const char* str)
{
    int str_len = (int) strlen(str);
    int num_chars = MultiByteToWideChar(CP_UTF8, 0, str, str_len, NULL, 0);
    WCHAR* wstrTo = (WCHAR*) malloc((num_chars + 1) * sizeof(WCHAR));
    if (wstrTo)
    {
        MultiByteToWideChar(CP_UTF8, 0, str, str_len, wstrTo, num_chars);
        wstrTo[num_chars] = L'\0';
    }
    return wstrTo;
}

CHAR* convert_from_wstring(const WCHAR* wstr)
{
    int wstr_len = (int) wcslen(wstr);
    int num_chars = WideCharToMultiByte(CP_UTF8, 0, wstr, wstr_len, NULL, 0, NULL, NULL);
    CHAR* strTo = (CHAR*) malloc((num_chars + 1) * sizeof(CHAR));
    if (strTo)
    {
        WideCharToMultiByte(CP_UTF8, 0, wstr, wstr_len, strTo, num_chars, NULL, NULL);
        strTo[num_chars] = '\0';
    }
    return strTo;
}

int main()
{
    const WCHAR* wText = L"Wide string";
    const char* text = convert_from_wstring(wText);
    std::cout << text << "\n";
    free(text);

    const WCHAR *wtext = convert_to_wstring("Multibyte string");
    std::wcout << wtext << "\n";
    free(wtext);

    return 0;
}

That being said, you really should be using std::string and std::wstring instead of char* and wchar_t* for better memory management:

std::wstring convert_to_wstring(const std::string &str)
{
    int num_chars = MultiByteToWideChar(CP_UTF8, 0, str.c_str(), str.length(), NULL, 0);
    std::wstring wstrTo;
    if (num_chars)
    {
        wstrTo.resize(num_chars);
        MultiByteToWideChar(CP_UTF8, 0, str.c_str(), str.length(), &wstrTo[0], num_chars);
    }
    return wstrTo;
}

std::string convert_from_wstring(const std::wstring &wstr)
{
    int num_chars = WideCharToMultiByte(CP_UTF8, 0, wstr.c_str(), wstr.length(), NULL, 0, NULL, NULL);
    std::string strTo;
    if (num_chars > 0)
    {
        strTo.resize(num_chars);
        WideCharToMultiByte(CP_UTF8, 0, wstr.c_str(), wstr.length(), &strTo[0], num_chars, NULL, NULL);
    }
    return strTo;
}

int main()
{
    const WCHAR* wText = L"Wide string";
    const std::string text = convert_from_wstring(wText);
    std::cout << text << "\n";

    const std::wstring wtext = convert_to_wstring("Multibyte string");
    std::wcout << wtext << "\n";

    return 0;
}

If you are using C++11 or later, have a look at the std::wstring_convert class for converting between UTF strings, eg:

std::wstring convert_to_wstring(const std::string &str)
{
    std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>, wchar_t> conv;
    return conv.from_bytes(str);
}

std::string convert_from_wstring(const std::wstring &wstr)
{
    std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>, wchar_t> conv;
    return conv.to_bytes(wstr);
}

If you need to interact with other code that is based on char*/wchar_t*, std::string as a constructor for accepting char* input and a c_str() method that can be used for char* output, and the same goes for std::wstring and wchar_t*.

Upvotes: 12

Related Questions