Reputation: 676
I've been diving into C/low-level programming/system design recently. As a seasoned Java developer I still remember my attemtps to pass SUN Java Certification and questions if char type in Java can be cast to Integer and how can that be done. That is what I know and remember - numbers up to 255 can be treated both like numbers or characters depending on casting.
Getting to know C I want to know more but I find it hard to find proper answer (tried googling but I usually get gazilion results how just to convert char to int in the code) how does EXACTLY it work, that C compiler/system calls transform number to character and vice versa.
AFAIK in the memory numbers are being stored. So let's assume in the memory cell we store value 65 (which is letter 'A'). So there is a value stored and suddenly C code wants to get it and store into char variable. So far so good. And then we issue printf procedure with %c formatting for given char parameter.
And here is where the magic happens - HOW EXACTLY printf knows that character with value 65 is letter 'A' (and should display it as a letter). It is a base sign from raw ASCII range (not some funny emoji-style UTF sign). Does it call external STD/libraries/system calls to consult encoding system? I would love some nitty-gritty, low-level explanation or at least link to trusted source.
Upvotes: 0
Views: 171
Reputation: 133988
"HOW EXACTLY
printf
knows that character with value 65 is letter 'A' (and should display it as a letter)."
It usually doesn't, and it does not even need to. Even the compiler does not see characters '
, A
and '
in the C language fragment
char a = 'A';
printf("%c", c);
If the source and execution character sets are both ASCII or ASCII-compatible, as is usually the case nowadays, the compiler will have among the stream of bytes the triplet 39, 65, 39 - or rather 00100111 01000001 00100111
. And its parser has been programmed with a rule that something between two 00100111
s is a character literal, and since 01000001
is not a magic value it is translated as is to the final program.
The C program, at runtime, then handles 01000001
all the time (though from time to time it might be 01000001
zero-extended to an int
, e.g. 00000000 00000000 00000000 01000001
on 32-bit systems; adding leading zeroes does not change its numerical value). On some systems, printf
- or rather the underlying internal file routines - might translate the character value 01000001
to something else. But on most systems, 01000001
will be passed to the operating system as is. Then on the operating system - or possibly in a GUI program receiving the output from the operating system - will want to display that character, and then the display font is consulted for the glyph that corresponds to 01000001
, and usually the glyph for letter 01000001
looks something like
And that will be displayed to the user.
At no point does the system really operate with glyphs or characters but just binary numbers. The system in itself is a Chinese room.
The real magic of printf
is not how it handles characters, but how it handles numbers, as these are converted to more characters. While %c
passes values as-is, %d
will convert such a simple integer value as 0b101111000110000101001110
to stream of bytes 0b00110001 0b00110010 0b00110011 0b00110100 0b00110101 0b00110110 0b00110111 0b00111000
so that the display routine will correctly display it as
Upvotes: 4
Reputation: 19504
The C language is largely agnostic about the actual encoding of characters. It has a source character set which defines how the compiler treats characters in the source code. So, for instance on an old IBM system the source character set might be EBCDIC where 65 does not represent 'A'.
C also has an execution character set which defines the meaning of characters in the running program. This is the one that seems more pertinent to your question. But it doesn't really affect the behavior of I/O functions like printf
. Instead it affects the results of ctype.h
functions like isalpha
and toupper
. printf
just treats it as a char
sized value which it receives as an int
due to variadic functions using default argument promotions (any type smaller than int
is promoted to int
, and float
is promoted to double
). printf
then shuffles off the same value to the stdout
file and then it's somebody else's problem.
If the source character set and execution character set are different, then the compiler will perform the appropriate conversion so the source token 'A'
will be manipulated in the running program as the corresponding A
from the execution character set. The choice of actual encoding for the two character sets, ie. whether it's ASCII or EBCDIC or something else is implementation defined.
With a console application it is the console or terminal which receives the character value that has to look it up in a font's glyph table to display the correct image of the character.
Character constants are of type int
. Except for the fact that it is implementation defined whether char
is signed or unsigned, a char
can mostly be treated as a narrow integer. The only conversion needed between the two is narrowing or widening (and possibly sign extension).
Upvotes: 4
Reputation: 67747
char
in C is just an integer CHAR_BIT bits long. Usually it is 8 bits long.
HOW EXACTLY printf knows that character with value 65 is letter 'A'
The implementation knows what characters encoding it uses and pritnf function code takes the appropriate action do output the letter 'A'
Upvotes: -1