Reputation: 109
I am recently reading The C Programming Language by Kernighan.
There is an example which defined a variable as int type but using getchar()
to store in it.
int x;
x = getchar();
Why we can store a char
data as a int
variable?
The only thing that I can think about is ASCII and UNICODE.
Am I right?
Upvotes: 3
Views: 6801
Reputation: 144695
getchar()
attempts to read a byte from the standard input stream. The return value can be any possible value of the type unsigned char
(from 0
to UCHAR_MAX
), or the special value EOF
which is specified to be negative.
On most current systems, UCHAR_MAX
is 255
as bytes have 8 bits, and EOF
is defined as -1
, but the C Standard does not guarantee this: some systems have larger unsigned char
types (9 bits, 16 bits...) and it is possible, although I have never seen it, that EOF
be defined as another negative value.
Storing the return value of getchar()
(or getc(fp)
) to a char
would prevent proper detection of end of file. Consider these cases (on common systems):
if char
is an 8-bit signed type, a byte value of 255
, which is the character ÿ
in the ISO8859-1 character set, has the value -1
when converted to a char
. Comparing this char
to EOF
will yield a false positive.
if char
is unsigned, converting EOF
to char
will produce the value 255
, which is different from EOF
, preventing the detection of end of file.
These are the reasons for storing the return value of getchar()
into an int
variable. This value can later be converted to a char
, once the test for end of file has failed.
Storing an int
to a char
has implementation defined behavior if the char
type is signed and the value of the int
is outside the range of the char
type. This is a technical problem, which should have mandated the char
type to be unsigned, but the C Standard allowed for many existing implementations where the char
type was signed. It would take a vicious implementation to have unexpected behavior for this simple conversion.
The value of the char
does indeed depend on the execution character set. Most current systems use ASCII or some extension of ASCII such as ISO8859-x, UTF-8, etc. But the C Standard supports other character sets such as EBCDIC, where the lowercase letters do not form a contiguous range.
Upvotes: 2
Reputation: 409166
The getchar
function (and similar character input functions) returns an int
because of EOF
. There are cases when (char) EOF != EOF
(like when char
is an unsigned
type).
Also, in many places where one use a char
variable, it will silently be promoted to int
anyway. Ant that includes constant character literals like 'A'
.
Upvotes: 6
Reputation: 20772
C requires int
be at least as many bits as char
. Therefore, int
can store the same values as char
(allowing for signed/unsigned differences). In most cases, int
is a lot larger than char
.
char
is an integer type that is intended to store a character code from the implementation-defined character set, which is required to be compatible with C's abstract basic character set. (ASCII qualifies, so do the source-charset and execution-charset allowed by your compiler, including the one you are actually using.)
For the sizes and ranges of the integer types (char
included), see your <limits.h>
. Here is somebody else's limits.h.
Upvotes: 1
Reputation: 689
C was designed as a very low-level language, so it is close to the hardware. Usually, after a bit of experience, you can predict how the compiler will allocate memory, and even pretty accurately what the machine code will look like.
Your intuition is right: it goes back to ASCII. ASCII is really a simple 1:1 mapping from letters (which make sense in human language) to integer values (that can be worked with by hardware); for every letter there is an unique integer. For example, the 'letter' CTRL-A is represented by the decimal number '1'. (For historical reasons, lots of control characters came first - so CTRL-G, which rand the bell on an old teletype terminal, is ASCII code 7. Upper-case 'A' and the 25 remaining UC letters start at 65, and so on. See http://www.asciitable.com/ for a full list.)
C lets you 'coerce' variables into other types. In other words, the compiler cares about (1) the size, in memory, of the var (see 'pointer arithmetic' in K&R), and (2) what operations you can do on it.
If memory serves me right, you can't do arithmetic on a char. But, if you call it an int, you can. So, to convert all LC letters to UC, you can do something like:
char letter;
....
if(letter-is-upper-case) {
letter = (int) letter - 32;
}
Some (or most) C compilers would complain if you did not reinterpret the var as an int before adding/subtracting.
but, in the end, the type 'char' is just another term for int, really, since ASCII assigns a unique integer for each letter.
Upvotes: -1
Reputation: 4812
getchar
is an old C standard function and the philosophy back then was closer to how the language gets translated to assembly than type correctness and readability. Keep in mind that compilers were not optimizing code as much as they are today. In C, int
is the default return type (i.e. if you don't have a declaration of a function in C, compilers will assume that it returns int
), and returning a value is done using a register - therefore returning a char
instead of an int
actually generates additional implicit code to mask out the extra bytes of your value. Thus, many old C functions prefer to return int
.
Upvotes: 1