Mohan
Mohan

Reputation: 155

Using unsigned char and char

I looked at many implementations of strCmp() and found that most of the pointers implementations are done using unsigned char

My question is why "unsigned" used in return even though if we didn't use it we will get the same result (based on the tests I did)?

if I didn't use it, will I get a wrong result for some values?

Lastly, is char unsigned or signed by default?

Example 1

int strCmp(const char* s1, const char* s2)
{
    while(*s1 && (*s1 == *s2))
    {
        s1++;
        s2++;
    }    
    return *(const unsigned char*)s1 - *(const unsigned char*)s2;
}

Example 2

int strCmp(const char *S1, const char *S2)
{

  for(; *S1 == *S2; ++S1, ++S2)
    if(*S1 == 0)
      return 0;
  return *(unsigned char *)S1 < *(unsigned char *)S2 ? -1 : 1;
  }

Upvotes: 4

Views: 463

Answers (2)

chux
chux

Reputation: 154243

I looked at many implementations of strCmp() and found that most of the pointers implementations are done using unsigned char

Code that implements the standard C library function int strcmp(const char *s1, const char *s2); is specified to perform the comparison as if the string was composed of unsigned char characters. This applies if char is implemented as signed char or unsigned char.

For all functions in this subclause, each character shall be interpreted as if it had the type unsigned char C11 §7.24.1 3

// Example that performs the correct compare without a possibility of overflow.
int strCmp(const char* s1, const char* s2) {
  const unsigned char *u1 = (const unsigned char *) s1;
  const unsigned char *u2 = (const unsigned char *) s2;
  while((*u1 == *u2) && *u1) {
    u1++;
    u2++;
  }    
  return (*u1 > *u2) - (*u1 < *u2);
}

Below fails on rare machines where range of unsigned char == range of unsigned.

return *(const unsigned char*)s1 - *(const unsigned char*)s2;

Upvotes: 1

user2371524
user2371524

Reputation:

My question is why "unsigned" used in return even though if we didn't use it we will get the same result (based on the tests I did)?

Arithmetics are done with the type int, so if char is signed, you will get wrong values for negative chars because of sign extending them.

Example: assume your chars are 8 bits wide and signed with 2's complement. Then the character at code point 128 would have an integer value of -128 and therefore compare smaller than any in the range [0,127], not what you want. Casting to unsigned char first makes sure the integer value is 128.

Lastly, is char unsigned or signed by default?

This is, in fact, implementation defined. So, explicitly use unsigned char to be sure.

Upvotes: 9

Related Questions