dxiv
dxiv

Reputation: 17638

C "observable behavior" in the context of UB "undefined behavior"

(Question was originally prompted by comments under this answer to Are there race conditions in this producer-consumer implementation? but is being asked here strictly from the C language perspective, without any concurrency or multi-threading involved.)

Consider this minimal code:

#define BUFSIZ 10
char buf[BUFSIZ];

void f(int *pn)
{
    buf[*pn]++;
    *pn = (*pn + 1) % BUFSIZ;
}

int main()
{
    int n = 0;
    f(&n);
    return n; 
}

Question: would the C "as-if" rules allow the compiler to rewrite the code as follows?

void f(int *pn)
{
    int n = *pn;
    *pn = (*pn + 1) % BUFSIZ;
    buf[n]++;
}

On one hand, the above would not change the observable behavior of the program as written.

On the other hand, f could get called with an invalid index, possibly from another translation unit:

int g()
{
    int n = -1001;
    f(&n);
}

In this latter case, both variants of the code would invoke UB when accessing the out-of-bounds array element. However, the original code would leave *pn at the value being passed into f (= -1001) while the rewritten code would step into UB-land only after modifying *pn (to 0).

Would such a difference count as "observable" or, back to the actual question, is there anything in the C standard that would specifically allow or preclude this type of code rewrite/optimization?

Upvotes: 3

Views: 207

Answers (1)

rici
rici

Reputation: 241681

  1. If any part of your program has undefined behaviour, then the behaviour of the entire program is undefined. In other words, the behaviour of the program is undefined even "before" any construct whose behaviour is undefined. (This is necessary to allow the compiler to perform certain optimizations which depend on behaviour being defined.)

  2. Given that neither variable is declared as volatile, I believe it is possible that the order of memory updates would be reordered as indicated, since observable behaviour is only guaranteed to be according to the execution model in the absence of undefined behaviour.

  3. "Observable behaviour" (in Standard C) is defined in §5.1.2.3 as:

    • Accesses to volatile objects are evaluated strictly according to the rules of the abstract machine.
    • At program termination, all data written into files shall be identical to the result that execution of the program according to the abstract semantics would have produced.
    • The input and output dynamics of interactive devices shall take place as specified in 7.21.3. The intent of these requirements is that unbuffered or line-buffered output appear as soon as possible, to ensure that prompting messages actually appear prior to a program waiting for input.

    This list does not include any potential response to undefined behaviour (such as a trap or signal), even though in vernacular terms a segfault is normally observable. The particular example in the question does not involve any of these three points. (The UB could prevent the program from successfully terminating, which basically voids the second point in observable behaviour.) So in the particular case of the code in the question, reordering would not alter any observable behaviour and could clearly be performed.

  4. My statement that the implementation's response to undefined behaviour is not limited to execution strictly following the component which engenders undefined behaviour created a lot more controversy in the comments thread than I expected, since it is a fairly well-known feature of modern C. It may be worth reviewing John Regehr's useful essay on undefined behaviour, from which I quote: (in the third part)

    More specifically, when a program dies due to performing an illegal operation such as dividing by zero or dereferencing a null pointer, is this considered to be side effecting? The answer is definitely “no.” … Since crash-inducing operations are not side effecting, the compiler can reorder them with respect to other operations,

    As a possibly more interesting example (taken from the comments thread), if a program produces several lines of output, and then deliberately executes an explicit divide-by-zero, one might expect that compiling and running the program would produce the output before responding in whatever undefined way it responds to the divide-by-zero. However, a compiler which detected the divide-by-zero and could prove that the control flow of the program guaranteed its execution would be perfectly entitled to produce an error message at translation time and refuse to produce an executable image.

    Alternatively, if it could not prove that control flow reached the divide-by-zero, it would be entitled to assume that the divide-by-zero could not happen, and therefore remove all code unequivocally leading up to the divide-by-zero (including the calls to the output function) as dead code.

    Both of the above fit into the list of example responses to undefined behaviour in §3.4.3: "from ignoring the situation completely with unpredictable results, … to terminating a translation or execution (with the issuance of a diagnostic message)."

Upvotes: 5

Related Questions