NOURDDINE BENYAHYA
NOURDDINE BENYAHYA

Reputation: 35

Differences in Casting: char vs. unsigned char

In this function, why do we need to cast with unsigned char? Can't we cast with char and get the same result since both have a range of "255"? Why choose unsigned char?

Suppose there is no ASCII code equal to -126. I can say the same about 255; both will give you a garbage value. If you tell me we choose it because we are working with bytes, and the maximum value of it is 255, I would say we are just comparing. So, in s1 and s2, the result will always be an ASCII code. Why do we choose unsigned char?

#include "libft.h"

int ft_strncmp(const char *s1, const char *s2, size_t n)
{
    size_t  i;

    i = 0;
    if (n == 0)
        return (0);
    while (i < n && (s1[i] != '\0' || s2[i] != '\0'))
    {
        if (s1[i] != s2[i])
            return ((unsigned char)s1[i] - (unsigned char)s2[i]);
        i++;
    }
    return (0);
}

Upvotes: 0

Views: 109

Answers (2)

John Bollinger
John Bollinger

Reputation: 181179

In this function, why do we need to cast with unsigned char?

Because the function is duplicating the behavior of the standard library function strncmp(), which compares the bytes of the arguments as if they have type unsigned char.

Can't we cast with char and get the same result since both have a range of "255"?

Not reliably, no. The C language specification explicitly allows char to have the same range and behavior as either unsigned char or signed char, and the latter is pretty common. Where the signed char equivalence applies (and supposing 8-bit bytes, which is not guaranteed prior to C23), the range of char is -128 to 127.

You could still do the comparison with type char, but that would produce different results on some systems than on others.

(Also: the elements are already chars. No casts would be needed to do the comparison in that type.)

Why choose unsigned char?

Because that produces the desired order, whereas char might not. And because using unsigned char yields a consistent order across implementations, even if you wanted to implement a different order.

Suppose there is no ASCII code equal to -126. I can say the same about 255; both will give you a garbage value.

ASCII has very little to do with it. C does not assume that char values are specifically ASCII codes. The runtime character set can be different from and incompatible with ASCII -- EBCDIC, say -- and there are machines in use today where that is the case. There is no assumption of or reliance on any particular character set here.

Upvotes: 2

chux
chux

Reputation: 154174

The standard C library performs string functions as if the the characters were unsigned char.

For all functions in this subclause, each character shall be interpreted as if it had the type unsigned char (and therefore every possible object representation is valid and has a different value).

As a char may be signed or unsigned, subtracting 2 char has a different result than 2 unsigned char when one of the char is negative. So casting to unsigned char forms a difference like the C library.


Pedantic

  • On rare implementations the width of char and int are the same, so subtracting to return the difference with the correct sign risks overflow. Instead do multiple compares.

  • With strings, and the nearly obsolete non-2's complement formats, ((unsigned char *)s1)[i] can differ from (unsigned char)s1[i] and is the preferred form.

Below fixes both issues:

int ft_strncmp(const char *s1, const char *s2, size_t n) {
  const unsigned char *u1 = (const unsigned char *)s1;
  const unsigned char *u2 = (const unsigned char *)s2;
  size_t  i = 0;
  // if (n == 0)      // Not needed
  //    return (0);
  while (i < n && (u1[i] != '\0' || u2[i] != '\0')) {
    if (u1[i] != u2[i]) {
      return (u1[i] > u2[i]) - (u1[i] < u2[i]);
    } 
    i++;
  }
  return 0;
}

or

int ft_strncmp_alt(const char *s1, const char *s2, size_t n) {
  const unsigned char *u1 = (const unsigned char *)s1;
  const unsigned char *u2 = (const unsigned char *)s2;
  size_t  i = 0;
  while (i < n && (u1[i] == u2[i]) && u1[i]) {
    i++;
  }
  if (i == n) {
    return 0;
  } 
  return (u1[i] > u2[i]) - (u1[i] < u2[i]);
}

Upvotes: 0

Related Questions