ABu
ABu

Reputation: 12279

Example of undefined behaviour involving the use of const_cast

For illustrative purposes, I have been trying to find an example, by using gcc, where the output of a program is different with and without optimization enabled (with and without -O3). The purpose of finding such example is to show how optimizations could make an apparently correct program behave different after optimizations have been active if the code contains undefined behaviour.

I have been trying different "combos" of the following program:

// I have tried defining blind in this and in a separate module. The result is the same.
void blind(int const* p) { ++*const_cast<int*>(p); }

#include <iostream>

int constant() { return 0; }

int main()
{
    int const p = constant();
    blind(&p);
    std::cout << p << std::endl;
    return 0; 
}

I was expecting that, without optimizations enabled, this program will show 1, but with optimizations enabled (-O3) it will show 0 (by replacing std::cout << p by std::cout << 0 directly), but that's not the case. If I replace the initialization by int const p = 0, it will print 0 with and without optimizations enabled, and so the behaviour is again the same.

I have tried different alternatives like doing arithmetic operations (expecting the compiler to prefer to "pre-compute" the value or something), calling blind several times, etc. But nothing works.

NOTE: Preferably, one example where the program won't probably crash in the optimized version.

Upvotes: 3

Views: 250

Answers (4)

ABu
ABu

Reputation: 12279

A nice and very simple case that matches the kind of example I was looking for is the following:

#include <iostream>
#include <climits>

bool check(int i)
{
    int j = i + 1;
    return j < i;
}

int main()
{
    std::cout << check(INT_MAX) << std::endl;
    return 0;
}

Without optimizations enabled, check returns 1, because overflow did happen. With optimizations enabled, even with -O1, check returns 0.

I started with:

#include <iostream>
#include <climits>

bool check(int i)
{
    return i + 1 < i;
}

int main()
{
    std::cout << check(INT_MAX) << std::endl;
    return 0;
}

Since signed integer overflow is UB, the compiler directly returned 0 without performing the actual comparision even without optimizations enabled: enter image description here

Since the behaviour was still the same with and without optimizations, I decided to move the calculation of i + 1 to a new variable j:

bool check(int i)
{
    int j = i + 1;
    return j < i;
}

Now, the compiler, in a non-optimized build, is forced to actually calculate j so the variable can be inspected with a debugger, and the comparision is actually performed, and that's why it returns 1.

However, with -O1, the compiler translated check to its equivalent form return i + 1 < i, which becomes return 0 as in the previous variation of the program.

Upvotes: 0

Peter Cordes
Peter Cordes

Reputation: 364448

When the initializer for a const int is a constant expression (like 0), the language rules say it becomes constexpr (thanks @Artyer for pointing this out). So there is a difference in the C++ semantics for const int p = 0; vs. const int p = foo(); unless you declare constexpr int foo(){...}, which is probably why compilers optimize them differently in practice.


When the definition of blind() isn't visible to the optimizer, I think this is still a missed optimization by GCC (and clang, ICC, and MSVC). They could choose to assume that nothing can modify a const the same way it does assume nothing modifies a constexpr, because a program that does has undefined behaviour.

When blind() is in the same compilation unit without __attribute__((noinline,noipa)), the UB is visible at compile time if optimization is enabled, so all bets are off and no amount of weirdness is particularly surprising.

But with just a prototype for blind(), compilers have to make asm that would work for a blind() that didn't have undefined behaviour, so it's interesting to look at what assumptions/optimizations they did make. And to consider whether they'd be allowed to compile the way you expected.


With const int p = 0;, GCC and clang propagate that constant to later uses of p in the same function (even with optimization disabled), correctly assuming that nothing else can possibly have changed the value of a const object. (Not even a debugger, which is something gcc and clang's -O0 default code gen is designed to support for non-const variables; that's one reason why they make separate blocks of asm for each statement which don't keep anything in registers across statements.)

I think it's a missed optimization to not constant-propagate const int p = constant(); in the same case, after inlining constant() to a constant 0. It's still a const int object so it's still UB for anything else to modify it.

Of course that doesn't happen in a debug build; without inlining constant() they don't know at compile-time what the actual value will be, so they can't use it as an immediate operand for later instructions. So compilers load it from memory at p's usual address, the same one they passed to blind(). So they use the modified value in debug builds, that's expected.

In optimized builds, they don't call constant, they store an immediate 0 to initialize the stack space whose address they pass to blind(), like we'd expect. But then after the call, they reload it instead of using another immediate 0. This is the missed optimization.

For a large object, it could be more efficient to use the copy that exists in memory instead of generating it again, especially if passing it to a print function passed by reference. But that's not the case for int; it is more efficient to just zero a register as an arg passed by value for std::cout::operator<<( int ) than to reload from the stack.


constexpr changes behaviour (for both debug and optimized)

With constexpr int constant(){ return 0; }, GCC and clang treat const int p = constant(); exactly the same as const int p = 0;, because constant() is a constant expression just like 0. It gets inlined even with gcc -O0, and the constant 0 gets used after the call to blind(), not reloading p.

Still not an example of code that changes at -O0 vs. -O3, though.


Apparently it matters to the compiler internals that it was initialized with a "constant expression", whether that's a literal or a constexpr function return value. But that's not fundamental, it's still UB to modify a const int no matter how it was initialized.

I'm not sure if compilers are intentionally avoiding this optimization or if it' just a quirk. Maybe not intentionally for this case, but as collateral damage of avoiding some class of things for some reason?

Or perhaps just because for constant-propagation purposes, it's not known until after inlining constant() that const int p will have a value that's known at compile time. But with constexpr int constant(), the compiler can treat the function call as part of a constant expression, so it definitely can assume it will have a known value for all later uses of p. This explanation seems overly simplistic because normally constant-propagation does work even for things that aren't constexpr, and GCC/clang transform program logic into SSA form as part of compilation, doing most of the optimization work on that, which should make it easy to see if a value is modified or not.

Maybe when considering passing the address to a function, they don't consider that the underlying object is known to be const, only whether it was initialized with a constexpr. If the object in question was only passed or returned by reference to this function, like const int *pptr = foo(); and blind(pptr), the underlying object might not be const, in which case blind() could modify *pptr without UB.

I find it surprising that both GCC and clang miss this optimization, but I'm pretty confident that it is actually undefined behaviour for blind() to modify the pointed-to const int, even when it's in automatic storage. (Not static where it could actually be in a read-only page and crash in practice.)

I even checked MSVC and ICC 2021 (classic, not LLVM-based), and they're the same as GCC/clang, not constant-propagating across blind() unless you use a constant expression to init p, making it a constexpr. (GCC/clang targeting other ISAs are of course the same; this optimization decision happens in the target-independent middle-end.)

I guess they all just base their optimization choice on whether or not its constexpr, even though though all 4 of those compilers were independently developed.


To make the asm simpler to look at on the Godbolt compiler explorer, I changed cout<<p to volatile int sink = p; to see whether gcc/clang would mov dword ptr [rsp+4], 0 a constant zero, or would load+store to copy from p's address to sink. cout << p << '\n' was simpler, but still messy vs. that.

Seeing constant vs. load+store is the behaviour we're ultimately interested in, so I'd rather see that directly than see a 0 or 1 and have to think through the steps to which I was expecting in which case. You can mouseover the volatile int sink = p; line and it'll highlight the corresponding instruction(s) in the asm output panes.

I could have just done return p, especially from a function not called main so it's not special. In fact that's even easier, makes even simpler asm (but load vs. zero instead of 2 instructions vs. 1). Still, it avoids the fact that GCC implicitly treats main as __attribute__((cold)), on the assumption that real programs don't spend most of their time in main. But the missed optimization is still present in int foo().

If you wanted to look at the case where UB is visible at compile time (which I didn't), you could see if it was storing a constant 1 when blind() was inlined. I expect so.

Upvotes: 1

Henrique Bucher
Henrique Bucher

Reputation: 4474

Several cases show different behavior from debug to non-debug/optimized code. Undefined behavior is not the only reason why this would happen as it is implied in some of the answers and comments.

  1. It will run slower. If the result depends on how long the code runs, like in an optimization, the results will be systematically different.

This happens a lot with FPGA "compiling" since the placement/routing phase is essentially just an optimization loop.

Example: let's compute log(2) using my own weird version of an alternating harmonic series. I stop the series after a given time elapsed.

#include <iostream>
#include <cstdint>
#include <cmath>
#include <array>

double calcln2() {
    constexpr size_t N = 1000000;
    std::array<double,N> values;
    for ( double& x : values ) x = 0;
    uint64_t t0 = __builtin_ia32_rdtsc();
    for  ( size_t j=1; __builtin_ia32_rdtsc() - t0 < 10000000ULL; j++ ) {
        for ( double& x : values ) { 
            if ( j%2==0 ) {
                x -= 1/double(j);
            } else {
                x += 1/double(j);
            }
        }
    }
    double sum = 0;
    for ( double& x : values ) sum += x;
    return sum/N;
}


int main() {
    std::cout << log(2) - calcln2() << std::endl;
}

The main() function will basically output the calculation error. An example of a debug run would give me 0.193147 while on a release run would result in 0.0399365, much less.

Godbolt: https://godbolt.org/z/zMc5dPns6

I can think of other cases but I will not go in the depth of generating an example code for each.

  1. Optimizations will typically imply fast math which might make rounding issues worse. On the other hand, optimizations might collapse an entire series (say the alternating harmonic series above) in its closed formula in which case it will be more precise.

  2. Executable size will be larger which can have side effects if

  3. Asserts will only trigger in debug mode so it will crash in one and not in another

Upvotes: 0

Douglas B
Douglas B

Reputation: 802

Now may be my time to shine. I asked this question a while ago and it seems to perfectly demonstrate an example of what you are looking for in a very short/simple program, which I will include below for completeness:

#include <iostream>

int broken_for_loop(){
    for (int i = 0; i < 10000; i+= 1000){
        std::cout << i << std::endl;
    }
}

int main(int argc, char const *argv[]){
    broken_for_loop();
}

You can see the discussion/explanation there (long story short, I don't return from a function that should return an int), but I think it does a good job of demonstrating how some UB can be pretty sneaky in presenting itself only in optimized binaries if you're not thinking about it/paying attention to compiler warnings.

Adding in case it wasnt clear: When compiled without optimization, the program prints 0...9000 and then exits properly. When compiled with -O3 the loop runs forever.

Compiled with: g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0

Upvotes: 2

Related Questions