user13469230
user13469230

Reputation:

Can %c be given a negative int argument in printf?

Can I pass a negative int in printf while printing through format specifier %c since while printing int gets converted into an unsigned char? Is printf("%c", -65); valid? — I tried it on GCC but getting a diamond-like character(with question-mark inside) as output. Why?

Upvotes: 0

Views: 415

Answers (1)

phuclv
phuclv

Reputation: 41764

Absolutely yes, if char is a signed type. C allows char to be either signed or unsigned and in GCC you can switch between them with -funsigned-char and -fsigned-char. When char is signed it's exactly the same thing as this

char c = -65;
printf("%c", c);

When passing to printf() the char variable will be sign-extended to int so printf() will also see -65 like if it's passed from a constant. printf simply has no way to differentiate between printf("%c", c); and printf("%c", -65); due to default promotion in variadic functions.

The printing result depends on the character encoding though. For example in the ISO-8859-1 or Windows-1252 charsets you'll see ¿ because (unsigned char)-65 == 0xBF. In UTF-8 (which is a variable-length encoding) 0xBF is not allowed as a character in the starting position. That's why you see � which is the replacement character for invalid bytes

Please tell me why the code point 0 to 255 are not mapped to 0 to 255 in unsigned char. I mean that they are non-negative so shouldn't I just look through the UTF-8 character set for their corresponding values?

The mapping is not done by relative position in the range as you thought, i.e. code point 0 maps to the CHAR_MIN, code point 40 maps to CHAR_MIN + 40, code point 255 maps to CHAR_MAX... In two's complement systems it's typically a simple mapping based on the value of the bit pattern when treating as unsigned. That's because the way values are usually truncated from a wider type. In C a character literal like 'a' has type int. Suppose 'a' is mapped to code point 130 in some theoretical character set then the below lines are equivalent

char c = 'a';
char c = 130;

Either way c will be assigned a value of 'a' after casting to char, i.e. (char)'a', which may be a negative value

So code points 0 to 255 are mapped to 0 to 255 in unsigned char. That means code point code point 0x1F will be stored in a char (signed or unsigned) with value 0x1F. Code point 0xBF will be mapped to 0xBF if char is unsigned and -65 if char is signed

I'm assuming 8-bit char for all the above things. Also note that UTF-8 is an encoding for the Unicode character set, it's not a charset on its own so you can't look up UTF-8 code points

Upvotes: 4

Related Questions