Weird behaviour in C++

Question

This program instead of printing all the three characters, prints only one. Why does it behave that way?

#include
#include

using namespace std;

int main() 
{
    string a("   ");
    a[0] = (char)65519; //Supposed to be UTF 16 characters
    a[1] = (char)65471;
    a[2] = (char)65469;

    //prints �
    cout << a << std::endl;

    //prints �
    for(int i = 0; i < a.size(); ++i) 
    {
        std::cout << a[i];
    }

    cout << "
";
    return 0;
}

I can understand the understand the � character being printed because my charset does not have a valid glyph/representation for it, but why isn't 3 such characters are printed but only one? And why does it happen even if I use a for loop?

EDIT: From the comments below, I am not caring about the loss of information. I know I am typecasting an integer(32 bit) to a char(8 bit) and that I am losing information. What I am concerned about is, why doesn't it print all three characters but prints only one?

Mike Seymour · Accepted Answer

After throwing away half of each 16-bit value, the remaining 8-bit values are:

0xef 0xbf 0xbd

Since these are not ASCII values (which are in the range of 0x00 to 0x7f), the output depends on how your terminal interprets non-ASCII values. One common encoding is UTF-8, and these three values happen to form a valid UTF-8 encoding of the Unicode replacement character, which is displayed as �.

Weird behaviour in C++

Answers (2)

Related Questions