Should signed or unsigned integers be used for sizes?

Question

The standard library vector::size() gives a size_t, an unsigned number. In one of the CppCon talks I have heard somebody (was it Chandler Carruth?) say that this is unfortunate and that it should rather use signed integers.

The background is that the overflow is not defined for signed integers, therefore the compiler has much more leeway. In a talk Carruth showed how a uint8_t as a for loop index in bzip2 creates many more machine instructions on x86 than int8_t because it has to explicitly simulate the overflow with masks and shifts.

In the code that I now work on, there are certain sizes which are strictly positive. These are represented as size_t. This seems decent because this shows that they cannot be negative. On the other hand, there is no need for the defined modular arithmetic, so as long as the signed integer is large enough (we go to like 200), the unsigned integer would have the wrong interface for the arithmetics that we want.

At some point in the code there are loops from 0 to this size. And then the loop indices are subtracted and the absolute value is taken.

When I compiled it with my more modern GCC 7, it could not resolve the proper overload of std::abs because size_t - size_t gives ambigious values, apparently. I have changed the code to use int in the loop indices:

for (int t1 = 0; t1 < Lt; t1++) {
  for (int t2 = 0; t2 < Lt; t2++) {

Now the abs(t1 - t2) works just fine. But the comparison t1 < Lt gives a warning because it is a comparison between signed and unsigned numbers.

What is the right approach?

Use unsigned integers for everthing that is non-negative and then use static_cast() whenever I need to do a substraction.
Use signed integers for loop indices but unsigned integers for the sizes of the containers. Then use static_cast in the comparisons.
Just use signed integers everywhere. When other libraries return unsigned integers, just use static_cast there in order to satisfy the warnings.

MSalters · Accepted Answer

"In a talk Carruth showed how a uint8_t as a for loop index in bzip2 creates many more machine instructions on x86 than int8_t because it has to explicitly simulate the overflow with masks and shifts."

Well, if you can use either type, the for-range must be limited to [0, 127]. Just use int as the index type, then. It is by definition the natural type for basic math operations, and typically maps well to CPU registers.

Using types optimized for minimal storage will not generate the fastest math, no. That is not a surprise. You can't draw conclusions about signed versus unsigned based on such flawed setups.

"size_t - size_t gives ambiguous values"

Well, it doesn't, but it does use modular arithmetic. size_t(1)-size_t(2)==size_t(-1), but size_t(-1) is the largest possible value. This follows directly from the definition of modular math: x-1 < x, except when x-1 wraps around because x==0. (Or equivalently x+1>x except when x+1==0)

Calling abs(size_t(x)) is therefore also pointless since every size_t value is positive. And comparing signed integers against size_t is equally fraught with unintended consequences. Explicit casts are good, as they make the consequences clear.

But there is no universal solution to automatically figure out which cast should be applied. If a mechanical rule could be invented, we could have left that rule to the compiler. We haven't, because we can't. You, as a programmer will have to consider each case numerically.

Should signed or unsigned integers be used for sizes?

Answers (1)

Related Questions