Reputation:
I'm new to C and I was reading a textbook which shows some pieces of buggy code of a function that determines whether one string is longer than another:
int strlonger(char *s, char *t) {
return strlen(s) - strlen(t) > 0;
}
The reason it is buggy is because return type of strlen
is unsigned integer, so the when the left operand result is negative, it will be casted to unsigned type, therefore produces incorrect result, e.g. -1 will be the maximum unsigned value which is of course great than 0.
it seems that the result of strlen(s) - strlen(t)
is also unsigned integer, but why it has to be in this way? I mean for example, 0u-1u is -1, -1 is an signed integer, then C should keep this value -1 without casting it back to unsigned, because I'm not coding like:
...
unsigned int result = strlen(s) - strlen(t);
return result > 0;
or C has some special rule that the result type of two operand should match the type of the operands?
Upvotes: 1
Views: 111
Reputation: 224596
… why it has to be in this way?
Theoretically, it does not have to be this way, but that is how it is designed in C.
The specific rule is in C 2018 6.3.1.8, “Usual arithmetic conversions”:
Many operators that expect operands of arithmetic type cause conversions and yield result types in a similar way. The purpose is to determine a common real type for the operands and result. For the specified operands, each operand is converted, without change of type domain, to a type whose corresponding real type is the common real type. Unless explicitly stated otherwise, the common real type is also the corresponding real type of the result, whose type domain is the type domain of the operands if they are the same, and complex otherwise… [Emphasis added.]
This passage tells us that, if x
and y
have type unsigned int
, the result of x - y
is also unsigned int
. (It goes on to present further rules about cases where the operands have mixed types, such as one double
and one int
, or have narrow integer types, such as char
, but the situation is simple for unsigned int
operands.)
Why is this a good design? Consider what the result of subtracting to unsigned int
can be. Let M be the maximum value of an unsigned int
. The minimum value is of course 0. If x
is M and y
is 0, x - y
is M. If x
is 0, and y
is M, x -y
is −M. So the result of x - y
can be anything from −M to +M. If unsigned int
has N bits, we would need a signed integer with N+1 bits in order to hold any potential result.
If we defined the result type of an operation to be a type that could hold any mathematical result, this design would be unworkable. Subtracting two 32-bit unsigned int
would require a 33-bit integer type. And adding or subtracting two of those would require a 34-bit type. And further operations would require wider and wider types. Not only does hardware generally not support the necessary widths, but doing calculations in a loop would require dynamic types that grow with each iteration of the loop. (And this is considering only addition and subtraction. With multiplication, the required sizes would grow even faster.)
So, our design has to use fix sizes for the result types. What should they be? Whether we define the result of subtracting two unsigned int
values to be unsigned int
or int
, only some of the possible mathematical results can be represented in the result type. For the most part, it is simpler to say the result type is the same as the operand types. It is up to the programmer to ensure they stay within the bounds of the type or, if they want something different, to write the code to get the result they want.
As an example, if you know size_t
is unsigned int
in your C implementation, and you know long int
is wider, then you can write (long int) x - y
. This explicitly converts x
to the wider (and signed) type. y
is also implicitly converted, by the rule cited above, and the result is produced in the type long int
. Then there will be no overflow regardless of the values of x
and y
.
In summary, it is not feasible for the compiler to manage types to avoid overflows, so it is left to the programmer to do this.
Upvotes: 0
Reputation: 68089
First of all the result of strlen
is size_t
which can be different than the unsigned integer. It is unsigned but it can have a different size https://godbolt.org/z/GrB5_z
The result of the operation has to fit in the resulting type so the best type is the same as the operands (assuming they are the same)
What to do - just change the logic of your function.
int strlonger(const char *s, const char *t)
{
return strlen(s) > strlen(t);
}
even the function name suggests this approach
I mean for example, 0u-1u is -1, -1 is an signed integer
no it is not. When substract 1u from 0u the unsigned integer wraps around. So the resulting integer will have all bits set (for 32bits unsigned integer it will be 0xffffffff)
Upvotes: 4