Reputation: 2378
I was reading about undefined behavior, and I'm not sure if it's a compile-time only feature, or if it can occurs at execution-time.
I understand this example well (this is extracted from the Undefined Behavior page of Wikipedia):
An example for the C language:
int foo(unsigned x) { int value = 5; value += x; if (value < 5) bar(); return value; }
The value of
x
cannot be negative and, given that signed integer overflow is undefined behavior in C, the compiler can assume that at the line of the if checkvalue >= 5
. Thus theif
and the call to the functionbar
can be ignored by the compiler since theif
has no side effects and its condition will never be satisfied. The code above is therefore semantically equivalent to:int foo(unsigned x) { int value = 5; value += x; return value; }
But this occurs at compilation-time.
What if I write, for example:
void foo(int x) {
if (x + 150 < 5)
bar();
}
int main() {
int x;
std::cin >> x;
foo(x);
}
and then the user type in MAX_INT - 100
("2147483547", if 32 bits-integer).
There will be an integer overflow, but AFAIK, it is the arithmetic logic unit of the CPU that will make an overflow, so the compiler is not involved here.
Is it still undefined behavior?
If yes, how does the compiler detect the overflow?
The best I could imagine is with the overflow flag of the CPU. If this is the case, does it means that the compiler can do anything he wants if the overflow flag of the CPU is set anytime at execution-time?
Upvotes: 6
Views: 1776
Reputation: 141648
Your analysis of the first example is incorrect. value += x;
is equivalent to:
value = value + x;
In this case value
is int
and x
is unsigned
, so the usual arithmetic conversion means that value
is first converted to unsigned, so we have an unsigned addition which by definition cannot overflow (it has well-defined semantics in accordance with modular arithmetic).
When the unsigned result is assigned back to value
, if it is larger than INT_MAX
then this is an out-of-range assignment which has implementation-defined behaviour. This is NOT overflow because it is assignment, not an arithmetic operation.
Which optimizations are possible therefore depends on how the implementation defines the behaviour of out-of-range assignment for integers. Modern systems all take the value which has the same 2's complement representation, but historically other systems have done some different things.
So the original example does not have undefined behaviour in any circumstance and the suggested optimization is , for most systems, not possible.
Your second example has nothing to do with your first example since it does not involve any unsigned arithmetic. If x > INT_MAX - 150
then the expression x + 150
causes undefined behaviour due to signed integer overflow. The language definition does not mention ALUs or CPUs so we can be certain that those things are not related to whether or not the behaviour is undefined.
If yes, how does the compiler detect the overflow?
It doesn't have to. Precisely because the behaviour is undefined, it means the compiler is not constrained by having to worry about what happens when there is overflow. It only has to emit an executable that exemplifies the behaviour for the cases which are defined.
In this program those are the inputs in the range [INT_MIN
, INT_MAX-150
] and so the compiler can transform the comparison to x < -145
because that has the same behaviour for all inputs in the well-defined range, and it doesn't matter about the undefined cases.
Upvotes: 2
Reputation: 64913
Yes but not necessarily in the way I think you might have meant it, that is, if in the machine code there is an addition and at runtime that addition wraps (or otherwise overflows, but on most architectures it would wrap) that is not UB by itself. The UB is solely in the domain of C (or C++). That addition may have been adding unsigned integers or be some sort of optimizations that the compiler can make because it knows the semantics of the target platform and can safely use optimizations that rely on wrapping (but you cannot, unless of course you do it with unsigned types).
Of course that does not at all mean that it is safe to use constructs that "wrap only at runtime", because those code paths are poisoned at compile time as well. For example in your example,
extern void bar(void);
void foo(int x) {
if (x + 150 < 5)
bar();
}
Is compiled by GCC 6.3 targeting x64 to
foo:
cmp edi, -145
jl .L4
ret
.L4:
jmp bar
Which is the equivalent of
void foo(int x) {
if (x < -145)
bar(); // with tail call optimization
}
.. which is the same if you assume that signed integer overflow is impossible (in the sense that it puts an implicit precondition on the inputs to be such that overflow will not happen).
Upvotes: 6