2OL1
2OL1

Reputation: 87

Mysterious ASCII value when working with chars and integers in C

The idea that in C a char looks up a value in ASCII (but doesn't become an integer) makes sense.

I wrote some code to illustrate this point, in which an integer value above 256 (256 possible values and the total number of ASCII items) wraps around to the beginning of ASCII, or 0. Interesting to me was that arithmetic can be performed when starting with an integer d and adding an integer to the character c.

// C starts as a character
char c = 'c';
printf("c equals %i\n", c);
printf("c in ascii: %c\n", c);
printf("\n");

// I starts as an integer
int i = 105;
printf("i equals %i\n", i);
printf("i in ascii: %c\n", i);
printf("\n");

// Using arithmetic on character 'c'
int d = c + 1;
printf("d equals %i\n", d);
printf("d in ascii: %c\n", d);
printf("\n");

// The value of a in ascii (97) + the number of ascii characters (256)
int a = 353;
printf("a equals %i\n", a);
printf("a in ascii: %c\n", a);

Output:
c equals 99
c in ascii: c

i equals 105
i in ascii: i

d equals 100
d in ascii: d

a equals 353
a in ascii: a

However, I encountered a mystery when starting with a char d and adding an integer to another char c.

// This makes sense...
char c = 'c';
int z = c + 100;
// But I would expect d to equal 199 as for z
char d = c + 100;

printf("c equals %i\n", c);
printf("z equals: %i\n", z);
printf("d equals %i\n", d);
printf("d equals %c\n", d);

Output:
c equals 99
z equals: 199
d equals -57
d equals 

d mysteriously becomes -57 and returns blank space when called as a char. A debugger shows me that d has an ASCII value of '\307', which I can't explain.

Upvotes: 0

Views: 855

Answers (1)

Eric Postpischil
Eric Postpischil

Reputation: 222753

The idea that in C a char looks up a value in ASCII (but doesn't become an integer) makes sense.

In C, a character constant is an integer and has type int.

C implementations do not necessarily use ASCII. The compiler generally does not have to look it up, because it receives the source code already encoded as bytes in a file or stream. It may have to do some translation between different character encodings, such as between ASCII and UTF-8.

I wrote some code to illustrate this point, in which an integer value above 256 (256 possible values and the total number of ASCII items) wraps around to the beginning of ASCII, or 0.

You should not rely on this behavior without understanding it. It may not always happen that way.

int a = 353;
printf("a equals %i\n", a);
printf("a in ascii: %c\n", a);

When the %c conversion is used, the value passed for it is converted to an unsigned char, per C 2018 7.21.6.1 8. In common C implementation, unsigned char is eight bits. Per C 2018 6.3.1.3 2, this conversion works modulo 256; it wraps as you described. This the character with code 353−256 = 97 is printed. This has nothing to do with ASCII; it is a result of unsigned char using eight bits. If the C implementation uses ASCII, then the value of 97 will cause an “a” to be printed.

char c = 'c';
int z = c + 100;
char d = c + 100;

printf("d equals %i\n", d);
printf("d equals %c\n", d);

In char d = c + 100;, the arithmetic is performed using int types. This is because 100 is an int constant, and the operands of + are converted to have a common type. (There are some complicated rules for this.) Given that the character 'c' has the value 99, so the variable c is 99, c + 100 yields 199.

Then the char d is initialized with 199. The C standard permits char to be signed or unsigned. It appears in your implementation that char is signed and eight bits, with values ranging from −128 to +127. So 199 cannot be represented in a char. Then the rules in C 2018 6.3.1.3 3 say that 199 is converted to an implementation-defined value or produces a signal.

It appears your implementation wraps this value modulo 256. So the result is 199−256 = −57, which is representable in a char, so d is initialized to 57.

Then, when you print this with %i, “−57” is printed.

When you print it with “%c”, it is converted to an unsigned char, as described above. This yields −57+256 = 199. This is not a code for an ASCII character, so your C implementation prints whatever character it has for value 199. That could appear as a blank space.

A debugger shows me that d has an ASCII value of '\307',…

\nnn is a common way of writing characters using octal. \307 means 3078 = 3•82 + 0•81 + 7•80 = 3•64 + 0•8 + 7•1 = 192 + 0 + 7 = 199.

Upvotes: 1

Related Questions