Reputation: 103
This program instead of printing all the three characters, prints only one. Why does it behave that way?
#include<iostream>
#include<string>
using namespace std;
int main()
{
string a(" ");
a[0] = (char)65519; //Supposed to be UTF 16 characters
a[1] = (char)65471;
a[2] = (char)65469;
//prints �
cout << a << std::endl;
//prints �
for(int i = 0; i < a.size(); ++i)
{
std::cout << a[i];
}
cout << "\n";
return 0;
}
I can understand the understand the � character being printed because my charset does not have a valid glyph/representation for it, but why isn't 3 such characters are printed but only one? And why does it happen even if I use a for loop?
EDIT: From the comments below, I am not caring about the loss of information. I know I am typecasting an integer(32 bit) to a char(8 bit) and that I am losing information. What I am concerned about is, why doesn't it print all three characters but prints only one?
Upvotes: 1
Views: 171
Reputation: 141
a[0] = (char)65519;
Oh, please, never write things like this. Don't forget that char contains one byte so the largest value of char is 127. You can also write (char)255 (that means -1 for signed char or 255 for unsigned) but no more than 255.
For Unicode write the following
wstring a(L" ");
a[0] = (wchar_t)65519; //Supposed to be UTF 16 characters
a[1] = (wchar_t)65471;
a[2] = (wchar_t)65469;
Upvotes: 2
Reputation: 254751
After throwing away half of each 16-bit value, the remaining 8-bit values are:
0xef 0xbf 0xbd
Since these are not ASCII values (which are in the range of 0x00
to 0x7f
), the output depends on how your terminal interprets non-ASCII values. One common encoding is UTF-8, and these three values happen to form a valid UTF-8 encoding of the Unicode replacement character, which is displayed as �.
Upvotes: 7