Reputation: 11986
I have a char
array holding several characters. I want to compare one of these characters with an unsigned char
variable. For example:
char myarr = { 20, 14, 5, 6, 42 };
const unsigned char foobar = 133;
myarr[2] = foobar;
if(myarr[2] == foobar){
printf("You win a shmoo!\n");
}
Is this comparison type safe?
I know from the C99 standard that char
, signed char
, and unsigned char
are three different types (section 6.2.5 paragraph 14).
unsigned char
and char
, and back, without losing precision and without risking undefined (or implementation-defined) behavior?In section 6.2.5 paragraph 15:
The implementation shall define
char
to have the same range, representation, and behavior as eithersigned char
orunsigned char
.
In section 6.3.1.3 paragraph 3:
Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
I'm afraid that if char
is defined as a signed char
, then myarr[2] = foobar
could result in an implementation-defined value that will not be converted correctly back to the original unsigned char
value; for example, an implementation may always result in the value 42
regardless of the unsigned
value involved.
unsigned
value in a signed
variable of the same type?Also what is an implementation-defined signal; does this mean an implementation could simply end the program in this case?
In section 6.3.1.1 paragraph 1:
-- The rank of
long long int
shall be greater than the rank oflong int
, which shall be greater than the rank ofint
, which shall be greater than the rank ofshort int
, which shall be greater than the rank ofsigned char
.-- The rank of any unsigned integer type shall equal the rank of the corresponding signed integer type, if any.
In section 6.2.5 paragraph 8:
For any two integer types with the same signedness and different integer conversion rank (see 6.3.1.1), the range of values of the type with smaller integer conversion rank is a subrange of the values of the other type.
In section 6.3.1 paragraph 2:
If an
int
can represent all values of the original type, the value is converted to anint
; otherwise, it is converted to anunsigned int
.
In section 6.3.1.8 paragraph 1:
Otherwise, both operands are converted to the unsigned integer type corresponding to the type of the operand with signed integer type.
The range of char
is guaranteed to be the same range as that of signed char
or unsigned char
, which are both subranges of int
and unsigned int
respectively as a result of their smaller integer conversion rank.
Since, integer promotions rules dictate that char
, signed char
, and unsigned char
be promoted to at least int
before being evaluated, does this mean that char
could maintain its "signedness" throughout the comparision?
For example:
signed char foo = -1;
unsigned char bar = 255;
if(foo == bar){
printf("same\n");
}
foo == bar
evaluate to a false value, even if -1
is equivalent to 255
when an explicit (unsigned char)
cast is used?UPDATE:
In section J.3.5 paragraph 1 regarding which cases result in implementation-defined values and behavior:
-- The result of, or the signal raised by, converting an integer to a signed integer type when the value cannot be represented in an object of that type (6.3.1.3).
For example, could the following code result in implementation-defined behavior since char
could be defined as a signed integer type:
char blah = (char)255;
Upvotes: 3
Views: 1899
Reputation: 11986
My original post is rather broad and consists of many specific questions of which I should have given each its own page. However, I address and answer each question here so future visitors can grok the answers more easily.
Question:
- Is this comparison type safe?
The comparison between myarr[2]
and foobar
in this particular case is safe since both variables hold unsigned values. In general, however, this is not true.
For example, suppose an implementation defines char
to have the same behavior as signed char
, and int
is able to represent all values representable by unsigned char
and signed char
.
char foo = -25;
unsigned char bar = foo;
if(foo == bar){
printf("This line of text will not be printed.\n");
}
Although bar
is set equal to foo
, and the C99 standard guarantees that there is no loss of precision when converting from signed char
to unsigned char
(see Answer 2), the foo == bar
conditional expression will evaluate false.
This is due to the nature of integer promotion as required by section 6.3.1 paragraph 2 of the C99 standard:
If an
int
can represent all values of the original type, the value is converted to anint
; otherwise, it is converted to anunsigned int
.
Since in this implementation int
can represent all values of both signed char
and unsigned char
, the values of both foo
and bar
are converted to type int
before being evaluated. Thus the resulting conditional expression is -25 == 231
which evaluates to false.
Question:
- Nevertheless, can I safely convert between
unsigned char
andchar
, and back, without losing precision and without risking undefined (or implementation-defined) behavior?
You can safely convert from char
to unsigned char
without losing precision (nor width nor information), but converting in the other direction -- unsigned char
to char
-- can lead to implementation-defined behavior.
The C99 standard makes certain guarantees which enable us to convert safely from char
to unsigned char
.
In section 6.2.5 paragraph 15:
The implementation shall define
char
to have the same range, representation, and behavior as eithersigned char
orunsigned char
.
Here, we are guaranteed that char
will have the same range, representation, and behavior as signed char
or unsigned char
. If the implementation chooses the unsigned char
option, then the conversion from char
to unsigned char
is essentially that of unsigned char
to unsigned char
-- thus no width nor information is lost and there are no issues.
The conversion for the signed char
option is not as intuitive, but is implicitly guaranteed to preserve precision.
In section 6.2.5 paragraph 6:
For each of the signed integer types, there is a corresponding (but different) unsigned integer type (designated with the keyword
unsigned
) that uses the same amount of storage (including sign information) and has the same alignment requirements.
In 6.2.6.1 paragraph 3:
Values stored in unsigned bit-fields and objects of type
unsigned char
shall be represented using a pure binary notation.
In section 6.2.6.2 paragraph 2:
For signed integer types, the bits of the object representation shall be divided into three groups: value bits, padding bits, and the sign bit. There need not be any padding bits; there shall be exactly one sign bit. Each bit that is a value bit shall have the same value as the same bit in the object representation of the corresponding unsigned type (if there are M value bits in the signed type and N in the unsigned type, then M
<=
N).
signed char
is guaranteed to occupy the same amount of storage as an unsigned char
, as are all signed integers in respect to their unsigned counterparts.unsigned char
is guaranteed to have a pure binary representation (i.e. no padding bits and no sign bit).signed char
is required to have exactly one sign bit, and no more than the same number of value bits as unsigned char
.Given these three facts, we can prove via pigeonhole principle that the signed char
type has at most one less than the number of value bits as the unsigned char
type. Similarly, signed char
can safely be converted to unsigned char
with not only no loss of precision, but no loss of width or information as well:
unsigned char
has storage size of N
bits.
signed char
must have the same storage size of N bits.unsigned char
has no padding or sign bits and therefore has N
value bitssigned char
can have at most N
non-padding bits, and must allocate exactly one bit as the sign bit.
signed char
can have at most N-1
value bits and exactly one sign bitAll signed char
bits therefore match up one-to-one to the respective unsigned char
value bits; in other words, for any given signed char
value, there is a unique unsigned char
representation.
/* binary representation prefix: 0b */
(signed char)(-25) = 0b11100111
(unsigned char)(231) = 0b11100111
Unfortunately, converting from unsigned char
to char
can lead to implementation-defined behavior. For example, if char
is defined by the implementation to behave as signed char
, then an unsigned char
variable may hold a value that is outside the range of values representable by a signed char
. In such cases, either the result is implementation-defined or an implementation-defined signal is raised.
In section 6.3.1.3 paragraph 3:
Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
Question:
- Does this mean that it is not safe to store an
unsigned
value in asigned
variable of the same type?
Trying to convert an unsigned
type value to a signed
type value can result in implementation-defined behavior if the unsigned
type value cannot be represented in the new signed
type.
unsigned foo = UINT_MAX;
signed bar = foo; /* possible implementation-defined behavior */
In section 6.3.1.3 paragraph 3:
Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
An implementation-defined result would be any value returned within the range of values representable by the new signed
type. An implementation could theoretically return the same value consistently (e.g. 42
) for these cases and thus loss information occurs -- i.e. there is no guarantee that converting from unsigned
to signed
to back to unsigned
will result in the same original unsigned
value.
An implementation-defined signal is that which conforms to the rules laid out in section 7.14 of the C99 standard; an implementation is permitted to define additional conforming signals which are not explicitly enumerated by the C99 standard.
In this particular case, an implementation could theoretically raise the SIGTERM
signal which requests the termination of the program. Thus, attempting to convert an unsigned
type value to signed
type could result in a program termination.
Question:
- Does
foo == bar
evaluate to a false value, even if-1
is equivalent to255
when an explicit (unsigned char
) cast is used?
Consider the following code:
signed char foo = -1;
unsigned char bar = 255;
if((unsigned char)foo == bar){
printf("same\n");
}
Although signed char
and unsigned char
values are promoted to at least int
before the evaluation of a conditional expression, the explicit unsigned char
cast will convert the signed char
value to unsigned char
before the integer promotions occur. Furthermore, converting to an unsigned
value is well-defined in the C99 standard and does not lead to implementation-defined behavior.
In section 6.3.1.3 paragraph 2:
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type
This the conditional expression essentially becomes 255 = 255
which evaluates to true.
until the value is in the range of the new type.
Questions:
- Does this mean that not even an explicit conversion is safe?
In general, an explicit cast to char
for a value outside the range of values representable by signed char
can lead to implementation-defined behavior (see Answer 3). A conversion need not be implicit for section 6.3.1.3 paragraph 3 of the C99 standard to apply.
Upvotes: 2
Reputation: 5289
I've tested your code and it doesn't compare (signed char)-1
and (unsigned char)255
the same.
You should convert signed char into unsigned char first, because it doesn't use the MSB sign bit in operations.
I have bad experience with using signed char type for buffer operations. Things like your problem then happen. Then be sure you have turned on all warnings during compilation and try to fix them.
Upvotes: 0
Reputation: 3974
It has to do with how the memory for the char's are stored, in an unsigned char, all 8 bits are used to represent the value of the char while a signed char uses only 7 bits for the number and the 8'th bit to represent the sign.
For an example, lets take a simpler 3 bit value (I will call this new value type tinychar):
bits unsigned signed
000 0 0
001 1 1
010 2 2
011 3 3
100 4 -4
101 5 -3
110 6 -2
111 7 -1
By looking at this chart, you can see the difference in value between a signed and an unsigned tinychar based on how the bits are arranged. Up until you start getting into the negative range, the values are identical for both types. However, once you reach the point where the left-most bit changes to 1, the value suddenly becomes a negative for the signed. The way this works is if you reach the maximum positive value (3) and then add one more you end up with the maximum negative value (-4) and if you subtract one from 0 you will underflow and cause the signed tinychar to become -1 while an unsigned tinychar would become 7. You can also see the equivalence (==) between an unsigned 7 and the signed -1 tinychar because the bits are the same (111) for both.
Now if you expand this to have a total of 8 bits, you should see similar results.
Upvotes: 1
Reputation: 11706
"does this mean that char
could maintain its 'signedness' throughout the comparison?" yes; -1
as a signed char
will be promoted to a signed int
, which will retain its -1
value. As for the unsigned char
, it will also keep its 255
value when being promoted, so yes, the comparison will be false. If you want it to evaluate to true, you will need an explicit cast.
Upvotes: 1