Jules Lamur
Jules Lamur

Reputation: 2378

Is over/underflow an undefined behavior at execution time?

I was reading about undefined behavior, and I'm not sure if it's a compile-time only feature, or if it can occurs at execution-time.

I understand this example well (this is extracted from the Undefined Behavior page of Wikipedia):

An example for the C language:

int foo(unsigned x)
{
    int value = 5;
    value += x;
    if (value < 5)
        bar();
    return value;
}

The value of x cannot be negative and, given that signed integer overflow is undefined behavior in C, the compiler can assume that at the line of the if check value >= 5. Thus the if and the call to the function bar can be ignored by the compiler since the if has no side effects and its condition will never be satisfied. The code above is therefore semantically equivalent to:

int foo(unsigned x)
{
     int value = 5;
     value += x;
     return value;
}

But this occurs at compilation-time.

What if I write, for example:

void foo(int x) {
    if (x + 150 < 5)
         bar();
}

int main() {
    int x;
    std::cin >> x;
    foo(x);
}

and then the user type in MAX_INT - 100 ("2147483547", if 32 bits-integer).

There will be an integer overflow, but AFAIK, it is the arithmetic logic unit of the CPU that will make an overflow, so the compiler is not involved here.

Is it still undefined behavior?

If yes, how does the compiler detect the overflow?

The best I could imagine is with the overflow flag of the CPU. If this is the case, does it means that the compiler can do anything he wants if the overflow flag of the CPU is set anytime at execution-time?

Upvotes: 6

Views: 1776

Answers (2)

M.M
M.M

Reputation: 141648

Your analysis of the first example is incorrect. value += x; is equivalent to:

value = value + x;

In this case value is int and x is unsigned, so the usual arithmetic conversion means that value is first converted to unsigned, so we have an unsigned addition which by definition cannot overflow (it has well-defined semantics in accordance with modular arithmetic).

When the unsigned result is assigned back to value, if it is larger than INT_MAX then this is an out-of-range assignment which has implementation-defined behaviour. This is NOT overflow because it is assignment, not an arithmetic operation.

Which optimizations are possible therefore depends on how the implementation defines the behaviour of out-of-range assignment for integers. Modern systems all take the value which has the same 2's complement representation, but historically other systems have done some different things.

So the original example does not have undefined behaviour in any circumstance and the suggested optimization is , for most systems, not possible.


Your second example has nothing to do with your first example since it does not involve any unsigned arithmetic. If x > INT_MAX - 150 then the expression x + 150 causes undefined behaviour due to signed integer overflow. The language definition does not mention ALUs or CPUs so we can be certain that those things are not related to whether or not the behaviour is undefined.

If yes, how does the compiler detect the overflow?

It doesn't have to. Precisely because the behaviour is undefined, it means the compiler is not constrained by having to worry about what happens when there is overflow. It only has to emit an executable that exemplifies the behaviour for the cases which are defined.

In this program those are the inputs in the range [INT_MIN, INT_MAX-150] and so the compiler can transform the comparison to x < -145 because that has the same behaviour for all inputs in the well-defined range, and it doesn't matter about the undefined cases.

Upvotes: 2

user555045
user555045

Reputation: 64913

Yes but not necessarily in the way I think you might have meant it, that is, if in the machine code there is an addition and at runtime that addition wraps (or otherwise overflows, but on most architectures it would wrap) that is not UB by itself. The UB is solely in the domain of C (or C++). That addition may have been adding unsigned integers or be some sort of optimizations that the compiler can make because it knows the semantics of the target platform and can safely use optimizations that rely on wrapping (but you cannot, unless of course you do it with unsigned types).

Of course that does not at all mean that it is safe to use constructs that "wrap only at runtime", because those code paths are poisoned at compile time as well. For example in your example,

extern void bar(void);

void foo(int x) {
    if (x + 150 < 5)
         bar();
}

Is compiled by GCC 6.3 targeting x64 to

foo:
        cmp     edi, -145
        jl      .L4
        ret
.L4:
        jmp     bar

Which is the equivalent of

void foo(int x) {
    if (x < -145)
         bar(); // with tail call optimization
}

.. which is the same if you assume that signed integer overflow is impossible (in the sense that it puts an implicit precondition on the inputs to be such that overflow will not happen).

Upvotes: 6

Related Questions