marbens
marbens

Reputation: 302

Does a character have to be casted to unsigned char before being compared to getc family returns?

I am not sure about whether I have to cast a character to an unsigned char before being compared to the return of a getc family function.

The functions I consider getc family, are getc, fgetc and getchar

I am only talking about single-byte characters.

Here is example without the cast:

#include <stdio.h>

int main(void) {
  int c;

  while ((c = getchar()) != '\n' && c != EOF) // loop until newline or EOF
    putchar(c);

  return 0;
}

Here is an example with the cast:

#include <stdio.h>

int main(void) {
  int c;

  while ((c = getchar()) != (unsigned char)'\n' && c != EOF) // loop until newline or EOF
    putchar(c);

  return 0;
}

On the implementation I use, both work.

Is the cast required for portable programs?

I believe yes, because C11/N1570 7.21.7.1p2, emphasis mine:

If the end-of-file indicator for the input stream pointed to by stream is not set and a next character is present, the fgetc function obtains that character as an unsigned char converted to an int and advances the associated file position indicator for the stream (if defined).

Upvotes: 3

Views: 158

Answers (5)

Luis Colorado
Luis Colorado

Reputation: 12708

You don't need to make a cast, as the character \n has a positive value either considered signed char or unsigned char. The conversion will convert unsigned char first to unsigned int leading all characters in the range [0..255] that will never compare equal to EOF --which has a negative value as int-- in the case of signed char to int conversion the character will be converted to int (this will result in U.B. for negative values, but normally this means sign extension of the sign bit, and there's a chance that the input value \377 to be converted into -1, makint the comparison with EOF to give fake results) and will do the comparison as int.

As I say, in this special case you don't need to make the explicit cast to unsigned, as the character to be converted will be '\n' which is positive. Conversions from signed to unsigned are first made if done automatically, and then the bit extension takes place, so you finally, must be a bit careful.

Upvotes: 0

Ted Lyngmo
Ted Lyngmo

Reputation: 117851

  • '\n' is, in C, an int
  • getchar() returns an int
  • c is an int

You never need to cast an int to int so your first loop is what it should look like.

EOF expands to an integer constant expression, with type int and a negative value, so that it is impossible for it to conflict with a char (in the most common cases) that's cast to unsigned char. The most common value it has is -1 which, if a successfully read char has the common range [-128, 127] and is cast to unsigned char gets the range [0, 255], can not conflict.

There are exceptions, such as when the char based types have the same size as int. In that case, the added cast would still be pointless.

C23 7.23.7.6 The getchar function:

The getchar function returns the next character from the input stream pointed to by stdin. If the stream is at end-of-file, the end-of-file indicator for the stream is set and getchar returns EOF. If a read error occurs, the error indicator for the stream is set and getchar returns EOF.

Hence, you can still use feof(stdin) || ferror(stdin) to make the EOF check safely on such systems.

Regarding the value of \n:

Paragraph 6.2.5 (Types) in the C23 standard

  1. An object declared as type char is large enough to store any member of the basic execution character set. If a member of the basic execution character set is stored in a char object, its value is guaranteed to be nonnegative. If any other character is stored in a char object, the resulting value is implementation-defined but shall be within the range of values that can be represented in that type.

So, '\n' which is one of the characters in the basic execution character set can't be negative.

Upvotes: 2

chux
chux

Reputation: 154208

When the character constant is positive or 0, (such as '\n' which is always positive *1) the cast is not needed.

When the character constant is negative, a cast is useful depending on coding goals.


*1 §6.2.5 ¶3 sentence 2 of the ISO C11 standard states the following: "If a member of the basic execution character set is stored in a char object, its value is guaranteed to be nonnegative." §5.2.1 defines the basic execution character set to include the newline character.

Upvotes: 0

Eric Postpischil
Eric Postpischil

Reputation: 224082

The C standard guarantees that character constants for these characters have nonnegative values:1

  • the Latin alphabet letters, A to Z and a to z,
  • the digits 0 to 9,
  • the graphic characters !, ", #, %, &, , (, ), *, +, ,, -, ., /, :, ;, <, =, >, ?, [, \, ], ^, _, {, |, }, and ~,
  • the space character, and
  • control characters representing horizontal tab, vertical tab, form feed, alert, backspace, carriage return, and new line.

This follows from several sections of the C standard:

  • 5.2.1 2 and 3 specify the characters of the basic execution character set.
  • 6.2.5 3 says any character of the basic execution character set stored in a char is nonnegative.
  • 6.4.4.4 10 says the value of a character constant, such as 'x', containing a single character (including a single character resulting from an escape sequence, like '\n') is its value as a char converted to int.

The nonnegative char values are always a subset of the unsigned char values, so each character constant of one of these characters will have the same value as the value returned by getc when reading the same character.

If you need to handle other characters and cannot ensure those characters have nonnegative values in your target platforms, then you should convert the character constants to unsigned char.

Footnote

1 There is one pedantic exception to this which does not occur in practice. In a C implementation in which char and int are the same width and char is unsigned, char may have values not representable in int. In this case, the conversion is implementation-defined, so it may produce negative values. This conversion would be the same for converting the unsigned char value to int for the character constant and for converting the unsigned char getc return value to int, so they would produce the same value for the same characters. Conceivably, the conversion might be defined to clamp instead of wrap, which would make multiple characters map to the same value and be impossible to distinguish. This would be a defect in the C implementation, and there would not be a way to work around it using only the features fully specified by the C standard.

Upvotes: 3

chqrlie
chqrlie

Reputation: 145287

The cast would be required if the character constant is negative because getc() returns byte values in the range of type unsigned char or the special negative value EOF, so negative character values are never returned by getc() actual bytes present in the file.

On an architecture where char is signed by default, character constants that represent characters that have a negative value, such as \377 will have this negative value (with type int), thus '\377' has the value -1 and would not match a byte in the file with all bits sets, but would match EOF mistakenly.

The C Standard specifies that all roman letters, digits, punctuation characters used for C operators and white space characters, including TAB, '\n' and other control characters are positive. Thus the newline character '\n' is positive and can be compared to c directly. The @ character used for email addresses is not in the list, so a perverse system that does not use ASCII might make this character negative and prevent c == '@' from matching successfully.

In your example, (c = getchar()) != '\n' is fully portable and does not require any cast.

Conversely, if you wish to match a non-breaking space '\240' (aka '\xa0'), the test c == '\240' would not work on systems where char is signed by default. To write portable code, you must indeed cast the character constant as (unsigned char) for such values:

#include <stdio.h>

int main(void) {
    int c;

    while ((c = getchar()) != EOF) {
        if (c == (unsigned char)'\240') {
            printf("NBSP found\n");
        }
        if (c == (unsigned char)'\377') {
            printf("0xFF found\n");
        }
    }
    return 0;
}

Upvotes: 1

Related Questions