why the result type of two operand should match the type of the operands?

Question

I'm new to C and I was reading a textbook which shows some pieces of buggy code of a function that determines whether one string is longer than another:

int strlonger(char *s, char *t) {
    return strlen(s) - strlen(t) > 0;
}

The reason it is buggy is because return type of strlen is unsigned integer, so the when the left operand result is negative, it will be casted to unsigned type, therefore produces incorrect result, e.g. -1 will be the maximum unsigned value which is of course great than 0.

it seems that the result of strlen(s) - strlen(t) is also unsigned integer, but why it has to be in this way? I mean for example, 0u-1u is -1, -1 is an signed integer, then C should keep this value -1 without casting it back to unsigned, because I'm not coding like:

...
unsigned int result = strlen(s) - strlen(t);
return result > 0;

or C has some special rule that the result type of two operand should match the type of the operands?

Eric Postpischil · Accepted Answer

… why it has to be in this way?

Theoretically, it does not have to be this way, but that is how it is designed in C.

The specific rule is in C 2018 6.3.1.8, “Usual arithmetic conversions”:

Many operators that expect operands of arithmetic type cause conversions and yield result types in a similar way. The purpose is to determine a common real type for the operands and result. For the specified operands, each operand is converted, without change of type domain, to a type whose corresponding real type is the common real type. Unless explicitly stated otherwise, the common real type is also the corresponding real type of the result, whose type domain is the type domain of the operands if they are the same, and complex otherwise… [Emphasis added.]

This passage tells us that, if x and y have type unsigned int, the result of x - y is also unsigned int. (It goes on to present further rules about cases where the operands have mixed types, such as one double and one int, or have narrow integer types, such as char, but the situation is simple for unsigned int operands.)

Why is this a good design? Consider what the result of subtracting to unsigned int can be. Let M be the maximum value of an unsigned int. The minimum value is of course 0. If x is M and y is 0, x - y is M. If x is 0, and y is M, x -y is −M. So the result of x - y can be anything from −M to +M. If unsigned int has N bits, we would need a signed integer with N+1 bits in order to hold any potential result.

If we defined the result type of an operation to be a type that could hold any mathematical result, this design would be unworkable. Subtracting two 32-bit unsigned int would require a 33-bit integer type. And adding or subtracting two of those would require a 34-bit type. And further operations would require wider and wider types. Not only does hardware generally not support the necessary widths, but doing calculations in a loop would require dynamic types that grow with each iteration of the loop. (And this is considering only addition and subtraction. With multiplication, the required sizes would grow even faster.)

So, our design has to use fix sizes for the result types. What should they be? Whether we define the result of subtracting two unsigned int values to be unsigned int or int, only some of the possible mathematical results can be represented in the result type. For the most part, it is simpler to say the result type is the same as the operand types. It is up to the programmer to ensure they stay within the bounds of the type or, if they want something different, to write the code to get the result they want.

As an example, if you know size_t is unsigned int in your C implementation, and you know long int is wider, then you can write (long int) x - y. This explicitly converts x to the wider (and signed) type. y is also implicitly converted, by the rule cited above, and the result is produced in the type long int. Then there will be no overflow regardless of the values of x and y.

In summary, it is not feasible for the compiler to manage types to avoid overflows, so it is left to the programmer to do this.

why the result type of two operand should match the type of the operands?

Answers (2)

Related Questions