Reputation: 155
I looked at many implementations of strCmp() and found that most of the pointers implementations are done using unsigned char
My question is why "unsigned" used in return even though if we didn't use it we will get the same result (based on the tests I did)?
if I didn't use it, will I get a wrong result for some values?
Lastly, is char unsigned or signed by default?
Example 1
int strCmp(const char* s1, const char* s2)
{
while(*s1 && (*s1 == *s2))
{
s1++;
s2++;
}
return *(const unsigned char*)s1 - *(const unsigned char*)s2;
}
Example 2
int strCmp(const char *S1, const char *S2)
{
for(; *S1 == *S2; ++S1, ++S2)
if(*S1 == 0)
return 0;
return *(unsigned char *)S1 < *(unsigned char *)S2 ? -1 : 1;
}
Upvotes: 4
Views: 463
Reputation: 154243
I looked at many implementations of strCmp() and found that most of the pointers implementations are done using unsigned char
Code that implements the standard C library function int strcmp(const char *s1, const char *s2);
is specified to perform the comparison as if the string was composed of unsigned char
characters. This applies if char
is implemented as signed char
or unsigned char
.
For all functions in this subclause, each character shall be interpreted as if it had the type
unsigned char
C11 §7.24.1 3
// Example that performs the correct compare without a possibility of overflow.
int strCmp(const char* s1, const char* s2) {
const unsigned char *u1 = (const unsigned char *) s1;
const unsigned char *u2 = (const unsigned char *) s2;
while((*u1 == *u2) && *u1) {
u1++;
u2++;
}
return (*u1 > *u2) - (*u1 < *u2);
}
Below fails on rare machines where range of unsigned char
== range of unsigned
.
return *(const unsigned char*)s1 - *(const unsigned char*)s2;
Upvotes: 1
Reputation:
My question is why "unsigned" used in return even though if we didn't use it we will get the same result (based on the tests I did)?
Arithmetics are done with the type int
, so if char
is signed, you will get wrong values for negative chars because of sign extending them.
Example: assume your chars are 8 bits wide and signed with 2's complement. Then the character at code point 128 would have an integer value of -128
and therefore compare smaller than any in the range [0,127], not what you want. Casting to unsigned char
first makes sure the integer value is 128
.
Lastly, is char unsigned or signed by default?
This is, in fact, implementation defined. So, explicitly use unsigned char to be sure.
Upvotes: 9