Reputation: 42690
I was wondering is it safe to do so?
wchar_t wide = /* something */;
assert(wide >= 0 && wide < 256 &&);
char myChar = static_cast<char>(wide);
If I am pretty sure the wide char will fall within ASCII range.
Upvotes: 49
Views: 214651
Reputation: 380
A short function I wrote a while back to pack a wchar_t array into a char array. Characters that aren't on the ANSI code page (0-127) are replaced by '?' characters, and it handles surrogate pairs correctly.
size_t to_narrow(const wchar_t * src, char * dest, size_t dest_len){
size_t i;
wchar_t code;
i = 0;
while (src[i] != '\0' && i < (dest_len - 1)){
code = src[i];
if (code < 128)
dest[i] = char(code);
else{
dest[i] = '?';
if (code >= 0xD800 && code <= 0xDBFF)
// lead surrogate, skip the next code unit, which is the trail
i++;
}
i++;
}
dest[i] = '\0';
return i - 1;
}
Upvotes: 7
Reputation:
An easy way is :
wstring your_wchar_in_ws(<your wchar>);
string your_wchar_in_str(your_wchar_in_ws.begin(), your_wchar_in_ws.end());
char* your_wchar_in_char = your_wchar_in_str.c_str();
I'm using this method for years :)
Upvotes: 10
Reputation:
Here's another way of doing it, remember to use free() on the result.
char* wchar_to_char(const wchar_t* pwchar)
{
// get the number of characters in the string.
int currentCharIndex = 0;
char currentChar = pwchar[currentCharIndex];
while (currentChar != '\0')
{
currentCharIndex++;
currentChar = pwchar[currentCharIndex];
}
const int charCount = currentCharIndex + 1;
// allocate a new block of memory size char (1 byte) instead of wide char (2 bytes)
char* filePathC = (char*)malloc(sizeof(char) * charCount);
for (int i = 0; i < charCount; i++)
{
// convert to char (1 byte)
char character = pwchar[i];
*filePathC = character;
filePathC += sizeof(char);
}
filePathC += '\0';
filePathC -= (sizeof(char) * charCount);
return filePathC;
}
Upvotes: 3
Reputation: 753705
Technically, 'char
' could have the same range as either 'signed char
' or 'unsigned char
'. For the unsigned characters, your range is correct; theoretically, for signed characters, your condition is wrong. In practice, very few compilers will object - and the result will be the same.
Nitpick: the last &&
in the assert
is a syntax error.
Whether the assertion is appropriate depends on whether you can afford to crash when the code gets to the customer, and what you could or should do if the assertion condition is violated but the assertion is not compiled into the code. For debug work, it seems fine, but you might want an active test after it for run-time checking too.
Upvotes: 3
Reputation: 122
one could also convert wchar_t --> wstring --> string --> char
wchar_t wide;
wstring wstrValue;
wstrValue[0] = wide
string strValue;
strValue.assign(wstrValue.begin(), wstrValue.end()); // convert wstring to string
char char_value = strValue[0];
Upvotes: 1
Reputation: 9305
You are looking for wctomb()
: it's in the ANSI standard, so you can count on it. It works even when the wchar_t
uses a code above 255. You almost certainly do not want to use it.
wchar_t
is an integral type, so your compiler won't complain if you actually do:
char x = (char)wc;
but because it's an integral type, there's absolutely no reason to do this. If you accidentally read Herbert Schildt's C: The Complete Reference, or any C book based on it, then you're completely and grossly misinformed. Characters should be of type int
or better. That means you should be writing this:
int x = getchar();
and not this:
char x = getchar(); /* <- WRONG! */
As far as integral types go, char
is worthless. You shouldn't make functions that take parameters of type char
, and you should not create temporary variables of type char
, and the same advice goes for wchar_t
as well.
char*
may be a convenient typedef for a character string, but it is a novice mistake to think of this as an "array of characters" or a "pointer to an array of characters" - despite what the cdecl tool says. Treating it as an actual array of characters with nonsense like this:
for(int i = 0; s[i]; ++i) {
wchar_t wc = s[i];
char c = doit(wc);
out[i] = c;
}
is absurdly wrong. It will not do what you want; it will break in subtle and serious ways, behave differently on different platforms, and you will most certainly confuse the hell out of your users. If you see this, you are trying to reimplement wctombs()
which is part of ANSI C already, but it's still wrong.
You're really looking for iconv()
, which converts a character string from one encoding (even if it's packed into a wchar_t
array), into a character string of another encoding.
Now go read this, to learn what's wrong with iconv.
Upvotes: 17
Reputation: 179819
In general, no. int(wchar_t(255)) == int(char(255))
of course, but that just means they have the same int value. They may not represent the same characters.
You would see such a discrepancy in the majority of Windows PCs, even. For instance, on Windows Code page 1250, char(0xFF)
is the same character as wchar_t(0x02D9)
(dot above), not wchar_t(0x00FF)
(small y with diaeresis).
Note that it does not even hold for the ASCII range, as C++ doesn't even require ASCII. On IBM systems in particular you may see that 'A' != 65
Upvotes: 0
Reputation: 308158
assert
is for ensuring that something is true in a debug mode, without it having any effect in a release build. Better to use an if
statement and have an alternate plan for characters that are outside the range, unless the only way to get characters outside the range is through a program bug.
Also, depending on your character encoding, you might find a difference between the Unicode characters 0x80 through 0xff and their char
version.
Upvotes: 17