hansoko
hansoko

Reputation: 389

Confused about getchar function and EOF

#include <stdio.h>

int main()
{
    int c = getchar();

    while (c != EOF) {
        putchar(c);
        c = getchar();
    }

    return 0;
}

The problem is distinguishing the end of input from valid data. The solution is that getchar returns a distinctive value when there is no more input, a value that cannot be confused with any real character. This value is called EOF, for ``end of file''. We must declare c to be a type big enough to hold any value that getchar returns. We can't use char since c must be big enough to hold EOF in addition to any possible char. Therefore we use int.

From 'The C Programming Language' book. I have three questions. Firstly, why do I get the output ^\Quit (core dumped) when I press the keys ctrl and 4 simultaneously while the above program runs? I'm using a GNU/Linux machine.

Secondly, I wrote a program like this :

#include <stdio.h>

int main()
{
    printf("The part before EOF\n");
    putchar(EOF);
    printf("The part after EOF\n");
}

Then compiled this as 'eof.out' and changed int c = getchar(); in the program from the book into char c = getchar();, saved it and then compiler the program as 'copy.out'. When I run the command ./eof.out | ./copy.out in the terminal the output I get is :

The part before EOF

Meaning the program 'copy.out' worked correctly since it didn't print the second printf but the passage above from the book indicates that there should've been some kind of failure since I changed the int into char so what happened?

Thirdly, when I change the char c = getchar(); into double c = getchar(); and run the command ./eof.out | ./copy.out the output I get is :

The part before EOF
�The part after EOF

Why didn't putchar(EOF); stop copy.out ? Doesn't a double have more bytes than both int and char? what is happening?

Upvotes: 3

Views: 318

Answers (1)

Eric Postpischil
Eric Postpischil

Reputation: 224546

getchar and putchar work with unsigned char values, not char values, so declaring c to be the char type causes a valid character 255 to be confused with EOF.

To simplify explanation, this answer assumes a common C implementation, except where stated: char is signed and eight bits, EOF is −1, and conversions to signed integer types modulo 2w, where w is the width of the type, in bits. The C standard permits some variations here, but these assumptions are typical in common C implementations and match the behavior reported in the question.

Consider this code for eof.c from the question:

#include <stdio.h>

int main()
{
    printf("The part before EOF\n");
    putchar(EOF);
    printf("The part after EOF\n");
}

When this program executes putchar(EOF), what happens is:

  • putchar converts EOF to unsigned char. This is specified in C 2018 7.21.7.3 (by way of 7.21.7.7 and 7.21.7.8).
  • Converting −1 to unsigned char yields 255, because conversion to an unsigned eight-bit integer type wraps modulo 256, and −1 + 256 = 255.
  • The character code 255 is written to standard output.

… changed int c = getchar(); in the program from the book into char c = getchar();, saved it and then compiler the program as 'copy.out'. When I run the command ./eof.out | ./copy.out in the terminal the output I get is :

The part before EOF

With c = getchar();, what happens when byte 255 is read and c = getchar() is evaluated is:

  • getchar returns 255. Note that it the character code as an unsigned char value, per C 2018 7.21.7.1 (by way of 7.21.7.5 and 7.21.7.6).
  • To assign 255 to c, 255 is converted to the char type. Per the assumption above, this wraps modulo 256, producing −1.

−1 is the value of EOF, so c != EOF is false, so the loop ends, and the program exits.

Why didn't putchar(EOF); stop copy.out ? Doesn't a double have more bytes than both int and char? what is happening?

With double c, the value assigned to c is the value returned from getchar; there is no change due to the destination type being unable to represent all the values getchar returns. When getchar returns the valid character code 255, c is set to 255, and the loop continues. When getchar returns the code −1 for end-of-file, c is set to −1, and the loop exits.

… the book indicates that there should've been some kind of failure since I changed the int into char

The passage from the book does not say there should be some kind of failure. It says EOF is “a value that cannot be confused with any real character”; it does not say you cannot convert EOF to a char. If your C implementation uses an unsigned char type, the conversion wraps the value modulo 2w, where w is the number of bits in a char, usually eight, so modulo 256. For example, −1 maps to 255. If your C implementation uses a signed char, the conversion is implementation-defined. So your eof.c program does not output an end-of-file indication when putchar(EOF) is evaluated. Instead, it outputs the character code 255.

Upvotes: 5

Related Questions