Reputation: 302
I am not sure about whether I have to cast a character to an unsigned char
before being compared to the return of a getc
family function.
The functions I consider getc
family, are getc
, fgetc
and getchar
I am only talking about single-byte characters.
Here is example without the cast:
#include <stdio.h>
int main(void) {
int c;
while ((c = getchar()) != '\n' && c != EOF) // loop until newline or EOF
putchar(c);
return 0;
}
Here is an example with the cast:
#include <stdio.h>
int main(void) {
int c;
while ((c = getchar()) != (unsigned char)'\n' && c != EOF) // loop until newline or EOF
putchar(c);
return 0;
}
On the implementation I use, both work.
Is the cast required for portable programs?
I believe yes, because C11/N1570 7.21.7.1p2, emphasis mine:
If the end-of-file indicator for the input stream pointed to by stream is not set and a next character is present, the fgetc function obtains that character as an unsigned char converted to an int and advances the associated file position indicator for the stream (if defined).
Upvotes: 3
Views: 158
Reputation: 12708
You don't need to make a cast, as the character \n
has a positive value either considered signed char
or unsigned char
. The conversion will convert unsigned char
first to unsigned int
leading all characters in the range [0..255] that will never compare equal to EOF
--which has a negative value as int
-- in the case of signed char
to int
conversion the character will be converted to int
(this will result in U.B. for negative values, but normally this means sign extension of the sign bit, and there's a chance that the input value \377
to be converted into -1, makint the comparison with EOF
to give fake results) and will do the comparison as int
.
As I say, in this special case you don't need to make the explicit cast to unsigned
, as the character to be converted will be '\n'
which is positive. Conversions from signed
to unsigned
are first made if done automatically, and then the bit extension takes place, so you finally, must be a bit careful.
Upvotes: 0
Reputation: 117851
'\n'
is, in C, an int
getchar()
returns an int
c
is an int
You never need to cast an int
to int
so your first loop is what it should look like.
EOF
expands to an integer constant expression, with type int
and a negative value, so that it is impossible for it to conflict with a char
(in the most common cases) that's cast to unsigned char
. The most common value it has is -1
which, if a successfully read char
has the common range [-128, 127]
and is cast to unsigned char
gets the range [0, 255]
, can not conflict.
There are exceptions, such as when the char
based types have the same size as int
. In that case, the added cast would still be pointless.
C23 7.23.7.6 The getchar
function:
The
getchar
function returns the next character from the input stream pointed to bystdin
. If the stream is at end-of-file, the end-of-file indicator for the stream is set and getchar returnsEOF
. If a read error occurs, the error indicator for the stream is set and getchar returnsEOF
.
Hence, you can still use feof(stdin) || ferror(stdin)
to make the EOF
check safely on such systems.
Regarding the value of \n
:
Paragraph 6.2.5 (Types) in the C23 standard
- An object declared as type
char
is large enough to store any member of the basic execution character set. If a member of the basic execution character set is stored in achar
object, its value is guaranteed to be nonnegative. If any other character is stored in achar
object, the resulting value is implementation-defined but shall be within the range of values that can be represented in that type.
So, '\n'
which is one of the characters in the basic execution character set can't be negative.
Upvotes: 2
Reputation: 154208
When the character constant is positive or 0, (such as '\n'
which is always positive *1) the cast is not needed.
When the character constant is negative, a cast is useful depending on coding goals.
*1 §6.2.5 ¶3 sentence 2 of the ISO C11 standard states the following: "If a member of the basic execution character set is stored in a char object, its value is guaranteed to be nonnegative." §5.2.1 defines the basic execution character set to include the newline character.
Upvotes: 0
Reputation: 224082
The C standard guarantees that character constants for these characters have nonnegative values:1
A
to Z
and a
to z
,0
to 9
,!
, "
, #
, %
, &
, ’
, (
, )
, *
, +
, ,
, -
, .
, /
, :
, ;
, <
, =
, >
, ?
, [
, \
, ]
, ^
, _
, {
, |
, }
, and ~
,This follows from several sections of the C standard:
char
is nonnegative.'x'
, containing a single character (including a single character resulting from an escape sequence, like '\n'
) is its value as a char
converted to int
.The nonnegative char
values are always a subset of the unsigned char
values, so each character constant of one of these characters will have the same value as the value returned by getc
when reading the same character.
If you need to handle other characters and cannot ensure those characters have nonnegative values in your target platforms, then you should convert the character constants to unsigned char
.
1 There is one pedantic exception to this which does not occur in practice. In a C implementation in which char
and int
are the same width and char
is unsigned, char
may have values not representable in int
. In this case, the conversion is implementation-defined, so it may produce negative values. This conversion would be the same for converting the unsigned char
value to int
for the character constant and for converting the unsigned char
getc
return value to int
, so they would produce the same value for the same characters. Conceivably, the conversion might be defined to clamp instead of wrap, which would make multiple characters map to the same value and be impossible to distinguish. This would be a defect in the C implementation, and there would not be a way to work around it using only the features fully specified by the C standard.
Upvotes: 3
Reputation: 145287
The cast would be required if the character constant is negative because getc()
returns byte values in the range of type unsigned char
or the special negative value EOF
, so negative character values are never returned by getc()
actual bytes present in the file.
On an architecture where char
is signed by default, character constants that represent characters that have a negative value, such as \377
will have this negative value (with type int
), thus '\377'
has the value -1
and would not match a byte in the file with all bits sets, but would match EOF
mistakenly.
The C Standard specifies that all roman letters, digits, punctuation characters used for C operators and white space characters, including TAB, '\n'
and other control characters are positive. Thus the newline character '\n'
is positive and can be compared to c
directly. The @
character used for email addresses is not in the list, so a perverse system that does not use ASCII might make this character negative and prevent c == '@'
from matching successfully.
In your example, (c = getchar()) != '\n'
is fully portable and does not require any cast.
Conversely, if you wish to match a non-breaking space '\240'
(aka '\xa0'
), the test c == '\240'
would not work on systems where char
is signed by default. To write portable code, you must indeed cast the character constant as (unsigned char)
for such values:
#include <stdio.h>
int main(void) {
int c;
while ((c = getchar()) != EOF) {
if (c == (unsigned char)'\240') {
printf("NBSP found\n");
}
if (c == (unsigned char)'\377') {
printf("0xFF found\n");
}
}
return 0;
}
Upvotes: 1