jmat
jmat

Reputation: 320

How do table mappings work in C?

I hope this question makes sense! I'm currently learning C (go easy!) and I'm interested in how table mappings work.

I'm using the extended ASCII table as an experiment. (http://www.ascii-code.com)

For example I can create a char and set its value to a tilde like so:

char charSymbol = '~';

And I can also specify the exact same value like so:

char charDec = 126;  
char charHex = 0x7E; 
char charOct = 0176;
char charBin = 0b01111110;

Regardless of which of the above declarations I choose (if I'm understanding things correctly) the value that's held in memory for each of these variables is always exactly the same. That is, the binary representation (01111110)

My question is; does the compiler hold the extended ASCII table and perform the binary value lookup during compilation? And if that's the case, does the machine the program is running on also hold the extended ASCII table to know that when the program is asked to print 01111110 to screen that it's to print a "~" ?

Upvotes: 3

Views: 1210

Answers (4)

Keith Thompson
Keith Thompson

Reputation: 263647

For most of the code in your question, no ASCII lookup table is needed.

Note that in C, char is an integer type, just like int, but narrower. A character constant like 'x' (for historical reasons) has type int, and on an ASCII-based system x is pretty much identical to 120.

char charDec = 126;  
char charHex = 0x7E; 
char charOct = 0176;
char charBin =  0b01111110;

(Standard C does not support binary constants like 0b01111110; that's a gcc extension.)

When the compiler sees an integer constant like 126 it computes an integer value from it. For this, it needs to know that 1, 2, and 6 are decimal digits, and what their values are.

char charSymbol = '~';

For this, the compiler just needs to recognize that ~ is a valid character.

The compiler reads all these characters from a text file, your C source. Each character in that file is stored as a sequence of 8 bits, which represent a number from 0 to 255.

So if your C source code contains:

putchar('~');

(and ~ happens to have the value 126), then all the compiler needs to know is that 126 is a valid character value. It generates code that sends the value 126 to the putchar() function. At run time, putchar sends that value to the standard output stream. If standard output is going to a file, the value 126 is stored in that file. If it's going to a terminal, the terminal software will do some kind of lookup to map the number 126 to the glyph that displays as the tilde character.

Compilers have to recognize specific character values. They have to recognize that + is the plus character, which is used to represent the addition operator. But for input and output, no ASCII mapping is needed, because each ASCII character is represented as a number at all stages of processing, from compilation to execution.

So how does a compiler recognize the '+' character? C compilers are typically written in C. Somewhere in the compiler's own sources, there's probably something like:

switch (c) {
    ...
    case '+':
        /* code to handle + character */
    ...
}

So the compiler recognizes + in its input because there's a + in its own source code -- and that + (stored in the compiler source code as the 8-bit number 43) resulted in the number 43 being stored in the compiler's own executable machine code.

Obviously the first C compiler wasn't written in C, because there was nothing to compile it. Early C compilers may have been written in B, or in BCPL, or in assembly language -- each of which is processed by a compiler or assembler that probably recognizes + because there's a + in its own source code. Each generation of C compiler passes on the "knowledge" that + to the next C compiler that it compiles. The "knowledge" that + is 43 is not necessarily written in the source code; it's propagated each time a new compiler is compiled using an old one.

For a discussion of this, see Ken Thompson's article "Reflections on Trusting Trust".

On the other hand, you can also have, for example, a compiler running on an ASCII-based system that generates code for an EBCDIC-based system, or vice versa. Such a compiler would have to have a lookup table mapping from one character set to the other.

Upvotes: 3

ultimate cause
ultimate cause

Reputation: 2304

C language has pretty weak type-safety and that is why you could always assign an integer to a character variable.

You used different representations of an integer to assign to the character variable - and that is supported in C programming language.

When you typed a "~" in a text file in your C program, your text editor actually converted the key-strokes and stored its ASCII equivalent. Therefore when the compiler parsed the C- code, it did not sense that what is written is a ~ (tilde). While parsing, when compiler encountered ASCII equivalent of ' (i.e single quotes) it went into a mode to read next byte as something that fits in a a char variable followed by another ' (single quote) . Since a char variable can have 0-255 different values it covers whole ASCII set, with extended char set included.

This is same when you use an assembler.

Printing on to screen in entirely different game - That is part of I/O system. When you key-in a specific character on keyboard, a pulse of a mapped integer goes in and settles in memory of the reading program. Similarly, when you print a specific integer on a printer or screen, that integer takes the shape of corresponding character.

Therefore if you want to print an integer in an int variable there are routines that convert each of its digits and send the ASCII code for each of them, and I/O system converts them into characters.

Upvotes: 1

nneonneo
nneonneo

Reputation: 179717

Actually, technically speaking your text editor is the one with the ASCII (or Unicode) table. The file is saved simply as a sequence of bytes; a compiler doesn't actually need to have an ASCII table, it just needs to know which bytes do what. (Yes, the compiler logically interprets the bytes as ASCII, but if you looked at the compiler's machine code all you'd see is a bunch of comparisons of the bytes against fixed byte values).

On the flip side, the executing computer has an ASCII table somewhere to map the bytes output by the program into readable characters. This table is probably in your terminal emulator.

Upvotes: 3

TDHofstetter
TDHofstetter

Reputation: 268

All those values are exactly equal to each other - they're just different representations of the same value, so the compiler sees them all in exactly the same way after translation from your written text into the byte value.

Upvotes: 0

Related Questions