Chlebik
Chlebik

Reputation: 676

How does C language transform char literal to number and vice versa

I've been diving into C/low-level programming/system design recently. As a seasoned Java developer I still remember my attemtps to pass SUN Java Certification and questions if char type in Java can be cast to Integer and how can that be done. That is what I know and remember - numbers up to 255 can be treated both like numbers or characters depending on casting.

Getting to know C I want to know more but I find it hard to find proper answer (tried googling but I usually get gazilion results how just to convert char to int in the code) how does EXACTLY it work, that C compiler/system calls transform number to character and vice versa.

AFAIK in the memory numbers are being stored. So let's assume in the memory cell we store value 65 (which is letter 'A'). So there is a value stored and suddenly C code wants to get it and store into char variable. So far so good. And then we issue printf procedure with %c formatting for given char parameter.

And here is where the magic happens - HOW EXACTLY printf knows that character with value 65 is letter 'A' (and should display it as a letter). It is a base sign from raw ASCII range (not some funny emoji-style UTF sign). Does it call external STD/libraries/system calls to consult encoding system? I would love some nitty-gritty, low-level explanation or at least link to trusted source.

Upvotes: 0

Views: 171

Answers (3)

"HOW EXACTLY printf knows that character with value 65 is letter 'A' (and should display it as a letter)."

It usually doesn't, and it does not even need to. Even the compiler does not see characters ', A and ' in the C language fragment

char a = 'A';
printf("%c", c);

If the source and execution character sets are both ASCII or ASCII-compatible, as is usually the case nowadays, the compiler will have among the stream of bytes the triplet 39, 65, 39 - or rather 00100111 01000001 00100111. And its parser has been programmed with a rule that something between two 00100111s is a character literal, and since 01000001 is not a magic value it is translated as is to the final program.

The C program, at runtime, then handles 01000001 all the time (though from time to time it might be 01000001 zero-extended to an int, e.g. 00000000 00000000 00000000 01000001 on 32-bit systems; adding leading zeroes does not change its numerical value). On some systems, printf - or rather the underlying internal file routines - might translate the character value 01000001 to something else. But on most systems, 01000001 will be passed to the operating system as is. Then on the operating system - or possibly in a GUI program receiving the output from the operating system - will want to display that character, and then the display font is consulted for the glyph that corresponds to 01000001, and usually the glyph for letter 01000001 looks something like

A

And that will be displayed to the user.

At no point does the system really operate with glyphs or characters but just binary numbers. The system in itself is a Chinese room.


The real magic of printf is not how it handles characters, but how it handles numbers, as these are converted to more characters. While %c passes values as-is, %d will convert such a simple integer value as 0b101111000110000101001110 to stream of bytes 0b00110001 0b00110010 0b00110011 0b00110100 0b00110101 0b00110110 0b00110111 0b00111000 so that the display routine will correctly display it as

12345678

Upvotes: 4

luser droog
luser droog

Reputation: 19504

The C language is largely agnostic about the actual encoding of characters. It has a source character set which defines how the compiler treats characters in the source code. So, for instance on an old IBM system the source character set might be EBCDIC where 65 does not represent 'A'.

C also has an execution character set which defines the meaning of characters in the running program. This is the one that seems more pertinent to your question. But it doesn't really affect the behavior of I/O functions like printf. Instead it affects the results of ctype.h functions like isalpha and toupper. printf just treats it as a char sized value which it receives as an int due to variadic functions using default argument promotions (any type smaller than int is promoted to int, and float is promoted to double). printf then shuffles off the same value to the stdout file and then it's somebody else's problem.

If the source character set and execution character set are different, then the compiler will perform the appropriate conversion so the source token 'A' will be manipulated in the running program as the corresponding A from the execution character set. The choice of actual encoding for the two character sets, ie. whether it's ASCII or EBCDIC or something else is implementation defined.

With a console application it is the console or terminal which receives the character value that has to look it up in a font's glyph table to display the correct image of the character.

Character constants are of type int. Except for the fact that it is implementation defined whether char is signed or unsigned, a char can mostly be treated as a narrow integer. The only conversion needed between the two is narrowing or widening (and possibly sign extension).

Upvotes: 4

0___________
0___________

Reputation: 67747

char in C is just an integer CHAR_BIT bits long. Usually it is 8 bits long.

HOW EXACTLY printf knows that character with value 65 is letter 'A'

The implementation knows what characters encoding it uses and pritnf function code takes the appropriate action do output the letter 'A'

Upvotes: -1

Related Questions