c++ccompiler-constructioncompiler-optimizationundefined-behavior

Reputation: 1705

Can you find a real example of "time travel" caused by undefined behaviour?

I am curious. Does anyone know a non-hypothetical C or C++ counterexample to the myth that "the impact of the undefined behaviour is limited to code which runs after the line with undefined behaviour"? In other words I want to see an example of undefined behaviour "travelling back in time" and causing an externally visible effect. By non-hypothetical I mean an example of a real, production-grade compiler (preferably a recent-ish version too) violating this assumption.

I want to see an example where something is printed to stdout (or stderr) that would not be printed if the myth were true or where something is not printed that would be or where something is printed, but the text is wrong. Why the insistence on printing to stdout? I have seen some examples that use writing to a volatile variable as an "externally observable" side effect. I am not very convinced by those examples because I do not see how I would be able to actually observe those writes in the common case of a regular process in user-space (i.e not the OS kernel, device driver or embedded).

Edit: Yakk - Adam Nevraumont pointed out that writing to stdout is done via library calls that the compiler can not reason about. It occurred to me that there exists another form of interprocess communication that I think the compiler might be able to reason about - atomic writes to memory shared with another process. Unfortunately it sounds like interprocess communication by shared-memory is not codified in the normative parts of the C++ standard

The printing has to be done by code that is executed before the line responsible for undefined behaviour. I am not interested in examples of wrong output being printed after the line with undefined behaviour has already been executed. Note: since it does not make sense to talk about execution of a program whose behaviour is undefined, please insert "if the behaviour were not undefined" where appropriate when interpreting this paragraph.

Edit: I found a better way to word this: I am asking for an example that would demonstrate a difference between the real undefined behaviour "all bets are off as soon as the program gets an input for which the behaviour of the program is undefined" and a fictional version of undefined behaviour "the program executes normally until it hits the line that causes undefined behaviour and after that all bets are off". I want to see an example of the code that would have executed normally if the myth were true doing something strange instead. Code that goes "after" the bad line misbehaving in some way can be explained by nasal daemons under both the correct and the incorrect mental model of undefined behaviour and is thus not a counterexample.

Here is the closest I was able to find:

void bar (void);

int a;

void foo3 (unsigned y, unsigned z)
{
    bar();
    a = y%z;
}

According to the blog linked above some (unknown) version of clang compiles this to:

foo3:
  pushl   %ebp
  movl    %esp, %ebp
  pushl   %esi
  subl    $4, %esp
  movl    8(%ebp), %eax
  xorl    %edx, %edx
  divl    12(%ebp)         <-- might crash the program
  movl    %edx, %esi
  call    bar              <-- might be side effecting
  movl    %esi, a
  addl    $4, %esp
  popl    %esi
  popl    %ebp
  ret

I was able to replicate a similar assembly output in clang 3.3 on godbolt, but not on more recent versions of clang.

I believe this example to be a compiler bug. Clang appears to assume that bar() is guaranteed to return, but C has exit() and longjump() - the code after the call to bar() may be dead code. The behaviour of a program is undefined only if the execution state that does a bad thing is reachable.

Rules:

Include a godbolt link or an alternative means of specifying the CPU architecture, OS, compiler version, standard library version and other technical details required to enable others to reproduce your results
If your example does not use stdout or stderr despite the above request, please explain how I can observe the "externally observable" effect your example uses to demonstrate "time travel" without using a virtual machine, running your code on bare metal, attaching a debugger or using a disassembler.
Flush the output buffers. I do not find missing output caused by data being stuck in a buffer when the program crashes convincing
No hypothetical examples. I am not asking about what the C standard allows the compiler to do or what it could plausibly do. I would like to see a real example of a compiler actually doing this
No compiler bugs
Only C or C++

Upvotes: 61

Answers (5)

Nate Eldredge

Reputation: 58142

This might be too obvious, or not quite within the rules:

#include <stdio.h>

static void foo(void);

int main(void) {
    printf("Everything ok so far\n");
    fflush(stdout);
    foo();
}

static void foo(void) {
    volatile char huge[999999999] = {0};
}

https://godbolt.org/z/arTW5KxEn

You might think that nothing in the function foo could possibly prevent the message from being printed. But clang inlines it, and then tries to do all stack allocation in the prologue of main, which overflows the stack and crashes the program before the message is printed.

This might not technically count as UB. I had a recollection that the standard had something like "if implementation-defined limits are exceeded then the behavior is undefined" but I don't actually see it anywhere.

Interestingly, but unsurprising when you think about it, such code can crash even if the function foo() is never actually called at runtime: https://godbolt.org/z/eeYdrPY3G

Upvotes: 15

André LFS Bacci

Reputation: 677

This wouldn't be a discussion about time traveling without time machines:

#include <stdio.h>
#include <stdlib.h>

typedef int (*FluxCapacitor)();

static FluxCapacitor TimeMachine;

static int DeLorean() {
    int year = 1955;
    return printf("Marty in %d.\n", year);
}

void TemporalParadox() {
    TimeMachine = DeLorean; // This is ok, no one calls this function.
    abort();                // But let's be sure.
}

int main() {
    TimeMachine();
    return 0;
}

Compiling and running the code on:

all generates the same output:

Marty in 1955.

How? DeLorean() is never called because TemporalParadox() also is never called. Both are effectively dead code.

Let's examine the assembly code from one the links above:

TemporalParadox:                        # @TemporalParadox
        push    rax
        call    abort@PLT
main:                                   # @main
        push    rax
        lea     rdi, [rip + .L.str]
        mov     esi, 1955
        xor     eax, eax
        call    printf@PLT
        xor     eax, eax
        pop     rcx
        ret
.L.str:
        .asciz  "Marty in %d.\n"

In other words, the main() got statically replaced by a function that is unreachable.

Again, how?

An optimizing compiler is allowed to shuffle things around, but without violating any defined behaviour. But the code above already has UB (this is a given in any answer to this question), so I choose the example above where an optimizing compiler shuffles around the UB itself, to the effect of time traveling a dead code to before the start of the program, and by so, by executing a line before the UB source code line. Here is how.

The compiler sees there is UB somewhere, so anything can happen, including the compiler trying to optimize the UB away.

Should be no undefined behavior in the compiled program, then TimeMachine is assumed to be initialized by the time it is called. The only thing that initializes TimeMachine is one of TemporalParadox()'s lines, so in a well formed program, this line must have been called beforehand.

As TimeMachine() is called at the beginning of the program, the effects of line TimeMachine = DeLorean; must be in place before this line, therefore before the beginning of the program.

The compiler is lifting one of TemporalParadox()s lines but never calls this function, so abort() is not called.

Because TimeMachine is static initialized to DeLorean and not assigned to anything else, DeLorean() can be inlined in all places where TimeMachine() is called.

And by the magic of a non existing time traveling machine, Marty got back to 1955.

This is obviously an adaptation of Mature Pessimizations's Undefined behavior can literally erase your hard disk, and there is some points of note:

The impact of the undefined behaviour is not limited to code which runs after the line with undefined behaviour: there is no code running after the UB line code;
To generate the result, the compiler is statically running dead code, before the explicit UB line code;
Further, the static inlining is only possible if the initialization occurs before any program line even runs, so in a way, the UB line is generating effects before any program line, UB or not UB.

If you don't believe that the line TimeMachine = DeLorean; is time traveling, try commenting it in each of godbolt.org's links above. Because there is no way dead code influences live code, right? Right?

TL;DR: The mere presence of UB can time travel dead code into existence, before time itself, and generate impossible printing to stdout on various latest compilers.

Upvotes: 31

Jeff Garrett

Reputation: 7393

I am asking for an example that would demonstrate a difference between the real undefined behaviour "all bets are off as soon as the program gets an input for which the behaviour of the program is undefined" and a fictional version of undefined behaviour "the program executes normally until it hits the line that causes undefined behaviour and after that all bets are off".

This phrasing is somewhat easier than the initial form of the question. Consider this:

int value;
static void setvalue() { value = 1; };
[[gnu::noinline]] static void ub() {
    setvalue();
    std::unreachable();
}
int main() {
    ub();
    return value;
}

UB is triggered by execution passing through std::unreachable(). In a fictional version of UB where execution prior to that is unchanged, we would have value == 1. This is observed by exit code after that, but the actual execution and logic which changes is before. godbolt

And this example is illuminating for the general case. The cases of UB that are caused by evaluating some expression (division by 0, call to std::unreachable, dereference invalid pointer, etc.) can be assumed not to happen. So you can imagine these as if these evaluations are replaced with a call std::unreachable(). That is a correct transformation.

Then you can imagine any code that unconditionally leads to that execution is dead code and can be removed.

And that is why some of the examples of time travel are unsatisfying. Opaque function calls to functions in some other library could never return. In such a case, the compiler cannot prove that the bad evaluation will definitely happen. This can include standard library I/O and syscalls. (This is less true with LTO, which can see across library boundaries.)

So... it can be difficult to cause a different observation prior to the execution of the bad expression, but the actual logic executed prior to that can easily and obviously change and that can be observed in some less common ways before the execution (e.g. shared memory, smarter compilers, LTO) and in any way after (notably without the logic after being altered or incorrect).

Upvotes: 6

Yakk - Adam Nevraumont

Reputation: 275545

Here we go:

__attribute__((noinline))
void foo(int* data) {
  bool bsilly = false;
  if (data) bsilly=true;
  if (bsilly) {
    std::cerr << "This should not print\n";
  } else {
    *data = 7;
  }
}

int main() {
  foo(nullptr);
}

The output we get is:

This should not print

with no segfault. We passed in nullptr, which should lead to bsilly remaining false and UB from *nullptr = 7. Instead, the compiler generates code that prints to cerr and returns despite data being nullptr.

The use of __attribute__((noinline)) is to make it act like a function in a different translation unit without whole program optimization. A dynamic library should work similar.

The "undefined behavior" in dereferencing a null pointer "caused" text to print, despite it never running. And if you comment out the *data=7; line, it no longer prints "This should not print\n" when passed nullptr.

I mean, technically this isn't time travel, as *data=7 happens elsewhen in an alternative universe, not later.

In a "lie told to children", my guess is that the compiler deduces as follows:

If data is null, then we run UB. Thus either data is nullptr (and we can do anything), or data is not nullptr. Conclusion: assume data is not nullptr, thus bsilly=true occurs, which means we print to std::cerr.

The compiler used to demonstrate this was the clang-trunk x86/64 on godbolt; it is the development branch after 18.1 (leading to 19.0 possibly) based on the list of compilers.

I wrote it with the "elsewhen" technique to avoid the compiler not "knowing" what the IO code does; if IO code can run arbitrary operations, then reasoning about code after the IO code determining the state before becomes impossible. So what I did was made the "undefined behavior" branch not run the IO code, and be fully able to locally reasoned about, thus causing "undefined behavior time travel".

You can modify the code so that it segfaults after printing if you prefer. There are a number of ways, but you still need to pull off tricks, as a naked *data=7; won't trigger the branch analysis needed for the time travel.

Upvotes: 90

supercat

Reputation: 81179

If an implementation allows distinguishing between the traps associated iwth different actions where the Standard waives jurisdiction, operations that are expected to be performed without side effects may be reordered across each other, causing the "wrong" kind of trap being fired.

int  *volatile v;
void test(int x)
{
    for (int i=0; i<1000; i++)
    {
        *v = 1;
        int q = 1000/x;
        *v = q;
    }
}
void (*volatile vtest)(int)=test;
int main()
{
    vtest(0);

}

This is demonstrable in clang and gcc with godbolt; changing the divide expression to something that wasn't loop-invariant like i/x would result in the first access to *v being performed before the division (causing a SIGSEGV), but with the code as written above, the division is performed first (yielding a SIGFPE).

Upvotes: 6

Can you find a real example of &quot;time travel&quot; caused by undefined behaviour?

Answers (5)

Related Questions

Can you find a real example of "time travel" caused by undefined behaviour?