tnkousik
tnkousik

Reputation: 103

Weird behaviour in C++

This program instead of printing all the three characters, prints only one. Why does it behave that way?

#include<iostream>
#include<string>

using namespace std;

int main() 
{
    string a("   ");
    a[0] = (char)65519; //Supposed to be UTF 16 characters
    a[1] = (char)65471;
    a[2] = (char)65469;

    //prints �
    cout << a << std::endl;

    //prints �
    for(int i = 0; i < a.size(); ++i) 
    {
        std::cout << a[i];
    }

    cout << "\n";
    return 0;
}

I can understand the understand the � character being printed because my charset does not have a valid glyph/representation for it, but why isn't 3 such characters are printed but only one? And why does it happen even if I use a for loop?

EDIT: From the comments below, I am not caring about the loss of information. I know I am typecasting an integer(32 bit) to a char(8 bit) and that I am losing information. What I am concerned about is, why doesn't it print all three characters but prints only one?

Upvotes: 1

Views: 171

Answers (2)

Hvarnah
Hvarnah

Reputation: 141

a[0] = (char)65519;

Oh, please, never write things like this. Don't forget that char contains one byte so the largest value of char is 127. You can also write (char)255 (that means -1 for signed char or 255 for unsigned) but no more than 255.

For Unicode write the following

wstring a(L"   ");
a[0] = (wchar_t)65519; //Supposed to be UTF 16 characters
a[1] = (wchar_t)65471;
a[2] = (wchar_t)65469;

Upvotes: 2

Mike Seymour
Mike Seymour

Reputation: 254751

After throwing away half of each 16-bit value, the remaining 8-bit values are:

0xef 0xbf 0xbd

Since these are not ASCII values (which are in the range of 0x00 to 0x7f), the output depends on how your terminal interprets non-ASCII values. One common encoding is UTF-8, and these three values happen to form a valid UTF-8 encoding of the Unicode replacement character, which is displayed as �.

Upvotes: 7

Related Questions