Reputation: 4445
Most C compilers use signed characters. Most C libraries define EOF as -1.
Despite being a long-time C programmer I had never before put these two facts together and so in the interest of robust and international software I would ask for a bit of help in spelling out the implications.
Here is what I have discovered thus far:
getchar() == (unsigned char) 'µ'
.<ctype.h>
functions are designed to handle EOF and expected unsigned characters. Any other negative input may cause out-of-bounds addressing.Is this assessment correct and if so what other gotchas did I miss?
Full disclosure: I ran into an out-of-bounds indexing bug today when feeding non-ASCII characters to isspace() and the realization of the amount of lurking bugs in my old code both scared and annoyed me. Hence this frustrated question.
Upvotes: 1
Views: 126
Reputation: 239181
The basic execution character set is guaranteed to be nonnegative - the precise wording in C99 is:
If a member of the basic execution character set is stored in a
char
object, its value is guaranteed to be nonnegative.
Upvotes: 2