Dunaril
Dunaril

Reputation: 2795

Fatal error in wchar_t* to char* conversion

Here is a C code that converts a wchar_t* string into a char* string :

wchar_t *myXML = L"<test/>";
size_t length;
char *charString;
size_t i;
length = wcslen(myXML);
charString = (char *)malloc(length);
wcstombs_s(&i, charString, length, myXML, length);

The code compiles but at exectution it detects a fatal error at the last line and stops running.

Now, if I replace the last line with this one :

wcstombs_s(&i, charString, length+1, myXML, length);

I just added +1 to the third argument. Then it works perfectly...

Why is there a need to add this trick ? Or is there a flaw elsewhere in my code ?

Upvotes: 0

Views: 395

Answers (2)

sarnold
sarnold

Reputation: 104020

DESCRIPTION
       The wcslen() function is the wide-character
       equivalent of the strlen(3) function.  It determines
       the length of the wide-character string pointed to by
       s, not including the terminating L'\0' character.

The trick is that you should always look for code of the form:

string = malloc(len);

very suspiciously, because both wcslen(3) and strlen(3) return the string length without the nul byte, and malloc(3) must allocate the space with that byte. C kinda sucks sometimes.

So every time you see string = malloc(len); rather than string = malloc(len+1);, be very careful to read how len gets assigned.

char String = (char *)malloc(length + 1);

Ought to do the trick. :)

EDIT:

Better would be to ask wcstombs() for the size to allocate in the first place:

size_t len = wcstombs(NULL,src,0) + 1;
char *dest = malloc(len);
len = wcstombs(dest, src, len);
if (len == -1) /* handle error */ ...

The +1 allocates for the ascii nul, and wcstombs() will report how much memory is required to do the conversion. It'll do the conversion twice, once to keep track of the memory required, and then once to store the result, but it will be MUCH simpler to maintain. The second time, when it stores the result, it will write at most len bytes including the ascii nul.

Upvotes: 2

Thomas
Thomas

Reputation: 181705

You need one extra byte for the '\0' terminator character. wcslen does not include this in the length it returns!

To do this properly, you don't just need to pass length+1 to wcstombs_s but also to malloc:

charString = (char *)malloc(length+1);
wcstombs_s(&i, charString, length+1, myXML, length);

And even then, I suspect it will not work correctly. Not all wide characters can be mapped to a single char, so for non-ASCII characters you will need extra space in the multi-byte string.

Upvotes: 4

Related Questions