Tim
Tim

Reputation: 99408

is it safe to subtract between unsigned integers?

Following C code displays the result correctly, -1.

#include <stdio.h>

main()
{

    unsigned x = 1;
    unsigned y=x-2;

   printf("%d", y );

}
  1. But in general, is it always safe to do subtraction involving unsigned integers?
  2. The reason I ask the question is that I want to do some conditioning as follows:

    unsigned x = 1; // x was defined by someone else as unsigned, 
    // which I had better not to change.
    for (int i=-5; i<5; i++){
      if (x+i<0) continue
      f(x+i); // f is a function
    }
    

    Is it safe to do so?

  3. How are unsigned integers and signed integers different in representing integers? Thanks!

Upvotes: 5

Views: 6194

Answers (7)

chux
chux

Reputation: 153338

  1. Its always safe to subtract unsigned as in

    unsigned x = 1;
    unsigned y=x-2;
    

    y will take on the value of -1 mod (UINT_MAX + 1) or UINT_MAX.
    Is it always safe to do subtraction, addition, multiplication, involving unsigned integers - no undefined behavior (UB). The answer will always be the expected mathematical result modded by UINT_MAX+1.

    But do not do printf("%d", y ); - that is UB. Instead use a matching specifier: printf("%u", y);

    C11 §6.2.5 9 "A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type."

  2. When unsigned and int are used in addition, the int is converted to an unsigned. So x+i has an unsigned result and never is that sum < 0. Safe, but now if (x+i<0) continue is pointless. f(x+i); is safe, but need to see f() prototype to best explain what may happen.

  3. Unsigned integers are always 0 to 2N-1 (where N is the number of value bits) and have well defined "overflow" results. Signed integers are now with C23, only 2's complement, and historically could have been 1s` complement, or sign-magnitude. All have UB on overflow. Some compilers take advantage of that and assume it never occurs when making optimized code.

Upvotes: 5

supercat
supercat

Reputation: 81105

An Unsigned integer type should be thought of not as representing a number, but as a member of something called an "abstract algebraic ring", specifically the equivalence class of integers congruent modulo (MAX_VALUE+1). For purposes of examples, I'll assume "unsigned int" is 16 bits for numerical brevity; the principles would be the same with 32 bits, but all the numbers would be bigger.

Without getting too deep into the abstract-algebraic nitty-gritty, when assigning a number to an unsigned type [abstract algebraic ring], zero maps to the ring's additive identity (so adding zero to a value yields that value), one means the ring's multiplicative identity (so multiplying a value by one yields that value). Adding a positive integer N to a value is equivalent to adding the multiplicative identity, N times; adding a negative integer -N, or subtracting a positive integer N, will yield the value which, when added to +N, would yield the original value.

Thus, assigning -1 to a 16-bit unsigned integer yields 65535, precisely because adding 1 to 65535 will yield 0. Likewise -2 yields 65534, etc.

Note that in an abstract algebraic sense, every integer can be uniquely assigned into to algebraic rings of the indicated form, and a ring member can be uniquely assigned into a smaller ring whose modulus is a factor of its own [e.g. a 16-bit unsigned integer maps uniquely to one 8-bit unsigned integer], but ring members are not uniquely convertible to larger rings or to integers. Unfortunately, C sometimes pretends that ring members are integers, and implicitly converts them; that can lead to some surprising behavior.

Subtracting a value, signed or unsigned, from an unsigned value which is no smaller than int, and no smaller than the value being subtracted, will yield a result according to the rules of algebraic rings, rather than the rules of integer arithmetic. Testing whether the result of such computation is less than zero will be meaningless, because ring values are never less than zero. If you want to operate on unsigned values as though they are numbers, you must first convert them to a type which can represent numbers (i.e. a signed integer type). If the unsigned type can be outside the range that is representable with the same-sized signed type, it will need to be upcast to a larger type.

Upvotes: 2

Emmet
Emmet

Reputation: 6391

Rather than really answering your questions directly, which has already been done, I'll make some broader observations that really go to the heart of your questions.

The first is that using unsigned in loop bounds where there's any chance that a signed value might crop up will eventually bite you. I've done it a bunch of times over 20 years and it has ultimately bit me every time. I'm now generally opposed to using unsigned for values that will be used for arithmetic (as opposed to being used as bitmasks and such) without an excellent justification. I have seen it cause too many problems when used, usually with the simple and appealing rationale that “in theory, this value is non-negative and I should use the most restrictive type possible”.

I understand that x, in your example, was decided to be unsigned by someone else, and you can't change it, but you want to do something involving x over an interval potentially involving negative numbers.

The “right” way to do this, in my opinion, is first to assess the range of values that x may take. Suppose that the length of an int is 32 bits. Then the length of an unsigned int is the same. If it is guaranteed to be the case that x can never be larger than 2^31-1 (as it often is), then it is safe in principle to cast x to a signed equivalent and use that, i.e. do this:

int y = (int)x;

// Do your stuff with *y*

x = (unsigned)y;

If you have a long that is longer than unsigned, then even if x uses the full unsigned range, you can do this:

long y = (long)x;

// Do your stuff with *y*

x = (unsigned)y;

Now, the problem with either of these approaches is that before assigning back to x (e.g. x=(unsigned)y; in the immediately preceding example), you really must check that y is non-negative. However, these are exactly the cases where working with the unsigned x would have bitten you anyway, so there's no harm at all in something like:

long y = (long)x;

// Do your stuff with *y*

assert( y >= 0L );
x = (unsigned)y;

At least this way, you'll catch the problems and find a solution, rather than having a strange bug that takes hours to find because a loop bound is four billion unexpectedly.

Upvotes: 4

Amaury Medeiros
Amaury Medeiros

Reputation: 2233

No, it's not safe. Trying to represent negative numbers with unsigned ints smells like bug. Also, you should use %u to print unsigned ints.

If we slightly modify your code to put %u in printf:

#include <stdio.h>

main()
{

    unsigned x = 1;
    unsigned y=x-2;

   printf("%u", y );
}

The number printed is 4294967295

Upvotes: 2

M.M
M.M

Reputation: 141544

1: Yes, it is safe to subtract unsigned integers. The definition of arithmetic on unsigned integers includes that if an out-of-range value would be generated, then that value should be adjusted modulo the maximum value for the type, plus one. (This definition is equivalent to truncating high bits).

Your posted code has a bug though: printf("%d", y); causes undefined behaviour because %d expects an int, but you supplied unsigned int. Use %u to correct this.

2: When you write x+i, the i is converted to unsigned. The result of the whole expression is a well-defined unsigned value. Since an unsigned can never be negative, your test will always fail.

You also need to be careful using relational operators because the same implicit conversion will occur. Before I give you a fix for the code in section 2, what do you want to pass to f when x is UINT_MAX or close to it? What is the prototype of f ?

3: Unsigned integers use a "pure binary" representation.

Signed integers have three options. Two can be considered obsolete; the most common one is two's complement. All options require that a positive signed integer value has the same representation as the equivalent unsigned integer value. In two's complement, a negative signed integer is represented the same as the unsigned integer generated by adding UINT_MAX+1, etc.

If you want to inspect the representation, then do unsigned char *p = (unsigned char *)&x; printf("%02X%02X%02X%02X", p[0], p[1], p[2], p[3]);, depending on how many bytes are needed on your system.

Upvotes: 7

chrk
chrk

Reputation: 4215

No, it's not safe.

Integers usually are 4 bytes long, which equals to 32 bits. Their difference in representation is:

As far as signed integers is concerned, the most significant bit is used for sign, so they can represent values between -2^31 and 2^31 - 1

Unsigned integers don't use any bit for sign, so they represent values from 0 to 2^32 - 1.


Part 2 isn't safe either for the same reason as Part 1. As int and unsigned types represent integers in a different way, in this case where negative values are used in the calculations, you can't know what the result of x + i will be.

Upvotes: 2

tkdkop
tkdkop

Reputation: 129

The reason the result is correct is because C doesn't do any overflow checks and you are printing it as a signed int (%d). This, however, does not mean it is safe practice. If you print it as it really is (%u) you won't get the correct answer.

Upvotes: 1

Related Questions