Evan Carslake
Evan Carslake

Reputation: 2359

C++ Issue, Converting wchar_t* to string

I am having a problem here. This is in Unicode. I have a stringtable that has values in it, separated by ;. I've been at this all day and I always end up with immediate runtime errors.

Stringtable looks like:

`blah;blah;foo;bar;car;star`

Then the code:

// More than enough size for this
const int bufferSize = 2048;

// Resource ID to a StringTable
int resid = IDS_MAP;
wchar_t readMap[bufferSize];            
resid = LoadString(NULL, resid, readMap, bufferSize);  

wchar_t* line;
line = wcstok(readMap,L";");

while (line != NULL) {

    line = wcstok(NULL,L";");
    wstring wstr(line); // Problem
    string str(wstr.begin(), wstr.end()); // Problem

    MessageBox(0,line,0,0) // No problem
}

The trouble is when I try to convert wchar_t* line to a wstring, to string. If I uncomment those two lines, it runs fine and message box shows properly.

Any ideas? Asking this question here was my last resort. Thanks.

Upvotes: 0

Views: 708

Answers (1)

Remy Lebeau
Remy Lebeau

Reputation: 598319

This statement:

line = wcstok(readMap,L";");

Reads the first delimited line in the buffer. OK.

However, in your loop, this statement:

line = wcstok(NULL,L";");

Is at the top of the loop and is thus throwing away that first line on the 1st iteration and then reading the next delimited line. Eventually, your loop will reach the end of the buffer and wcstok() will return NULL, but you are not checking for that condition before using line:

line = wcstok(readMap,L";"); // <-- reads the first line

while (line != NULL) {

    line = wcstok(NULL,L";"); // <-- 1st iteration throws away the first line
    wstring wstr(line); // <-- line will be NULL on last iteration

    //...
}

The line = wcstok(NULL,L";"); statement needs to be moved to the bottom of the loop instead:

wchar_t* line = wcstok(readMap, L";");

while (line != NULL)
{
    // use line as needed...

    line = wcstok(NULL, L";");
}

I would suggest changing the while loop into a for loop to enforce that:

for (wchar_t* line = wcstok(readMap, L";"); (line != NULL); line = wcstok(NULL, L";"))
{
    // use line as needed...
}

On the other hand, since you are using C++, you should consider using std:wistringstream and std:getline() instead of wcstok():

#include <string>
#include <sstream>

// after LoadString() exits, resid contains the
// number of character copied into readMap...
std::wistringstream iss(std::wstring(readMap, resid));

std::wstring line;
while (std::getline(iss, line, L';'))
{
    // use line as needed...
}

But either way, this statement is just plain wrong:

string str(wstr.begin(), wstr.end()); // Problem

This statement will work correctly only if the std::wstring contains ASCII characters in the #0 - #127 range. For non-ASCII characters, you have to perform a data conversion instead to avoid data loss for Unicode characters > U+00FF.

Since you are running on Windows, you can use the Win32 API WideCharToMultiByte() function:

std::wstring line;
while (std::getline(iss, line, L';'))
{
    std::string str;

    // optionally substitute CP_UTF8 with any ANSI codepage you want...
    int len = WideCharToMultiByte(CP_UTF8, 0, line.c_str(), line.length(), NULL, 0, NULL, NULL);
    if (len > 0)
    {
        str.resize(len);
        WideCharToMultiByte(CP_UTF8, 0, line.c_str(), line.length(), &str[0], len, NULL, NULL);
    }

    // use str as needed...
    MessageBoxW(0, line.c_str(), L"line", 0);
    MessageBoxA(0, str.c_str(), "str", 0);
}

Or, if you are using C++11 or later, you can use the std::wstring_convert class (only for UTF-8/16/32 conversions, though):

#include <locale> 

std::wstring line;
while (std::getline(iss, line, L';'))
{
    std::wstring_convert<std::codecvt_utf8<wchar_t>, wchar_t> conv;
    std::string str = conv.to_bytes(line);

    // use str as needed...
    MessageBoxW(0, line.c_str(), L"line", 0);
    MessageBoxA(0, str.c_str(), "str", 0);
}

Upvotes: 1

Related Questions