Yavuz
Yavuz

Reputation: 1393

ASCii value of ü in C++

According to that site ASCii value of ü is 129 but when I run printf("%d",'ü') code, output is -4. What's the cause of that?

Upvotes: 0

Views: 1642

Answers (3)

DevSolar
DevSolar

Reputation: 70391

The fact that you get a -4 is basically pure chance, as it is depending on the locale setting of your environment and the implementation of your compiler.

Others already pointed out that, depending on whether your platform considers char to be signed or not, printing a char value as if it were an integer might yield negative numbers for values of 0x80 and higher.


As for encodings (and be aware that below list is by no means exhaustive):

ü does not have an ASCII value, as (US-) ASCII only defines characters up to 0x7f (127).

IBM Codepage 437 and 850 (DOS) have ü at 0x81, which is -127 or 129 depending on signedness.

ISO-8859-1 through -4, -9, -10, and -13 through -16 as well as Windows codepages 1250 and 1252 have ü at 0xfc (-4 / 252). The other ISO-8859 encodings don't have the ü in their character set.

UTF-8 - which everyone should be using instead of those 8-bit encodings of yesteryear for a variety of reasons - encodes ü as the two-byte sequence 0xc3 0xbc.

I've put together a side-by-side codepage for personal use, if you are interested you can find it at my homepage.


Once you have stomached that, be aware that the standard defines two character sets, one for the representation of source, and one for the representation of strings in the executable code. Neither contains any characters beyond the basic A-Z range, the two might actually be different (think cross-compiler), and neither has its numerical representation defined - i.e. you might actually be looking at EBCDIC, where characters aren't even encoded with consecutive values (i.e., assert( 'Z' - 'A' == 26 ) would fail).

You think that's funny? Well, basically your machine doesn't even have to provide characters like @, as that is ASCII, but not part of the basic character set. ;-)

Generally speaking, once you use non-ASCII characters in source, you left well-defined behaviour behind and are relying on the implementation / environment.

Upvotes: 4

john
john

Reputation: 8027

On your system char is a signed type. You should first convert to an unsigned type before printing.

printf("%d", (unsigned char)'ü');

Whether this will print the 129 you expect is another matter, but it will at least print the encoding of ü in your execution character set.

Upvotes: 2

Tevo D
Tevo D

Reputation: 3381

%d is printing a signed decimal number, which for a byte would print in the range of -128-127). You probably want to use unsigned (%u) which will output the expected 0-255.

Upvotes: 1

Related Questions