Reputation: 35
In this function, why do we need to cast with unsigned char? Can't we cast with char and get the same result since both have a range of "255"? Why choose unsigned char?
Suppose there is no ASCII code equal to -126. I can say the same about 255; both will give you a garbage value. If you tell me we choose it because we are working with bytes, and the maximum value of it is 255, I would say we are just comparing. So, in s1 and s2, the result will always be an ASCII code. Why do we choose unsigned char?
#include "libft.h"
int ft_strncmp(const char *s1, const char *s2, size_t n)
{
size_t i;
i = 0;
if (n == 0)
return (0);
while (i < n && (s1[i] != '\0' || s2[i] != '\0'))
{
if (s1[i] != s2[i])
return ((unsigned char)s1[i] - (unsigned char)s2[i]);
i++;
}
return (0);
}
Upvotes: 0
Views: 109
Reputation: 181179
In this function, why do we need to cast with unsigned char?
Because the function is duplicating the behavior of the standard library function strncmp()
, which compares the bytes of the arguments as if they have type unsigned char
.
Can't we cast with char and get the same result since both have a range of "255"?
Not reliably, no. The C language specification explicitly allows char
to have the same range and behavior as either unsigned char
or signed char
, and the latter is pretty common. Where the signed char
equivalence applies (and supposing 8-bit bytes, which is not guaranteed prior to C23), the range of char
is -128 to 127.
You could still do the comparison with type char
, but that would produce different results on some systems than on others.
(Also: the elements are already char
s. No casts would be needed to do the comparison in that type.)
Why choose unsigned char?
Because that produces the desired order, whereas char
might not. And because using unsigned char
yields a consistent order across implementations, even if you wanted to implement a different order.
Suppose there is no ASCII code equal to -126. I can say the same about 255; both will give you a garbage value.
ASCII has very little to do with it. C does not assume that char
values are specifically ASCII codes. The runtime character set can be different from and incompatible with ASCII -- EBCDIC, say -- and there are machines in use today where that is the case. There is no assumption of or reliance on any particular character set here.
Upvotes: 2
Reputation: 154174
The standard C library performs string functions as if the the characters were unsigned char
.
For all functions in this subclause, each character shall be interpreted as if it had the type
unsigned char
(and therefore every possible object representation is valid and has a different value).
As a char
may be signed or unsigned, subtracting 2 char
has a different result than 2 unsigned char
when one of the char
is negative. So casting to unsigned char
forms a difference like the C library.
Pedantic
On rare implementations the width of char
and int
are the same, so subtracting to return the difference with the correct sign risks overflow. Instead do multiple compares.
With strings, and the nearly obsolete non-2's complement formats, ((unsigned char *)s1)[i]
can differ from (unsigned char)s1[i]
and is the preferred form.
Below fixes both issues:
int ft_strncmp(const char *s1, const char *s2, size_t n) {
const unsigned char *u1 = (const unsigned char *)s1;
const unsigned char *u2 = (const unsigned char *)s2;
size_t i = 0;
// if (n == 0) // Not needed
// return (0);
while (i < n && (u1[i] != '\0' || u2[i] != '\0')) {
if (u1[i] != u2[i]) {
return (u1[i] > u2[i]) - (u1[i] < u2[i]);
}
i++;
}
return 0;
}
or
int ft_strncmp_alt(const char *s1, const char *s2, size_t n) {
const unsigned char *u1 = (const unsigned char *)s1;
const unsigned char *u2 = (const unsigned char *)s2;
size_t i = 0;
while (i < n && (u1[i] == u2[i]) && u1[i]) {
i++;
}
if (i == n) {
return 0;
}
return (u1[i] > u2[i]) - (u1[i] < u2[i]);
}
Upvotes: 0