claws
claws

Reputation: 54100

Which of the following combinations of post & pre-increment operators have undefined behaviour in C?

I've read, Could anyone explain these undefined behaviors (i = i++ + ++i , i = i++, etc...) and tried understanding Sequence points on "comp.lang.c FAQ" after wasting more than 2 hours of time trying to explain the following results by gcc compiler.

expression(i=1;j=2)     i       j       k
k = i++ + j++;          2       3       3
k = i++ + ++j;          2       3       4
k = ++i + j++;          2       3       4
k = ++i + ++j;          2       3       5

k = i++ + i++;          3               2
k = i++ + ++i;          3               4
k = ++i + i++;          3               4
k = ++i + ++i;          3               6

i = i++ + j++;          4       3
i = i++ + ++j;          5       3
i = ++i + j++;          4       3
i = ++i + ++j;          5       3

i = i++ + i++;          4
i = i++ + ++i;          5
i = ++i + i++;          5
i = ++i + ++i;          6

Question:

  1. I want to know if all the expressions shown (in 4 groups) in above figure have undefined behavior? If only some of them have undefined behavior which ones does and which ones doesn't?

  2. For defined behaviour expressions, kindly can you show (not explain) how compiler evaluates them. Just to make sure, if I got this pre-increment & post increment correctly.

Background:

Today, I've attended a campus interview, in which I was asked to explain the results of i++ + ++i for a given value of i. After compiling that expression in gcc, I realized that the answer I gave in interview was wrong. I decided not to make such mistake in future and hence, tried to compile all possible combinations of pre and post increment operators and compile them in gcc and then try to explain the results. I struggled for more than 2 hours. I couldn't find single behaviour of evaluation of these expressions. So, I gave up and turned to stackoverflow. After little bit of reading archives, found that there is something like sequence point and undefined behaviour.

Upvotes: 5

Views: 2010

Answers (7)

user50619
user50619

Reputation: 353

Commas are a bit tricky. They do go left to right when in pairs (for vars in for loops really). Statements separated by commas are not guaranteed to be evaluated in a given order if placed in more than a pair of statements. Also note that where function arguments and declarations are separated by commas the order of execution is not guaranteed.

So

int a=0;
function_call(++a, ++a, ++a);

can have unpredictable results.

Upvotes: 0

supercat
supercat

Reputation: 81115

In cases where a compiler can tell that two lvalue expressions identify the same object, there would be no meaningful cost to having it behave in some sensible fashion. The more interesting scenarios are those in which one or more of the operands are dereferenced pointers.

Given the code:

void test(unsigned *a, unsigned *b, unsigned *c)
{
  (*a) = (*b)++ + (*c)++;
}

there are many sensible ways in which a compiler might process that. It could load b and c, add them, store the result to a, and then increment b and c, or it could load a and b, compute a+b, a+1, and b+1, and then write them in arbitrary sequence, or perform any of countless other sequences of operations. On some processors, some arrangements might be more efficient than others, and a compiler should have no reason to expect that programmers would regard any arrangement as more suitable than any other.

Note that even though on most hardware platforms there would be a limited number of plausible behaviors that could result from passing identical pointers to a, b, and c, the authors of the Standard make no effort to distinguish plausible and implausible outcomes. Even though many implementations could easily at essentially zero cost offer some behavioral guarantee (e.g. guarantee that code like the above would always set *a, *b, and *c to some possibly-Unspecified values without any other side-effects), and even though such a guarantee might sometimes be useful (if pointers will identify distinct objects in cases where the objects' values matter, but might not do so otherwise) it is fashionable for compiler writers to regard any slight possibility of useful optimization they could achieve when granted carte blanche to trigger arbitrarily-destructive side-effects will be worth more than the value programmers could receive from an assurance of constrained behavior.

Upvotes: 0

prashant dhaundiyal
prashant dhaundiyal

Reputation: 1

In most cases gcc implements pre increments first and use those values in the operations and after that evaluates post increments.

For example. In block 2 Pre increments none so for i 1 is used

k = i++ + i++ // hence k = 1+1=2

And two post increments in i so i= 3

One pre increment changes i to 2

k = i++ + ++i // hence k= 2+2= 4

One post increment in i so i= 3

Same for k= ++i + i++

Two pre increments in i makes it 3

k=++i + ++i // hence k=3+3= 6

And i = 3

Hope that explains a bit. But it purely depends on compiler.

Upvotes: -1

John Bode
John Bode

Reputation: 123458

I want to know if all the expressions shown (in 4 groups) in above figure have undefined behavior?

Lines 2 through 5:

k = i++ + j++;
k = i++ + ++j;
k = ++i + ++j;
k = ++i + j++;

are all well-defined. All other expressions are undefined, because they all attempt to modify the value of an object through the evaluation of an expression more than once between sequence points (for these examples, the sequence point occurs at the ';' terminating each statement). For example, i = i++; is undefined because we're trying to modify the value of i through both an assignment and a postfix ++ without an intervening sequence point. FYI = operator doesn't introduce a sequence point. || && ?: and ,comma operators introduce sequence points

For defined behaviour expressions, kindly can you show (not explain) how compiler evaluates them.

Let's start with

k = i++ + j++;

The expression a++ evaluates to the current value of a, and at some point before the next sequence point, a is incremented by 1. So, logically, the evaluation goes something like

k = 1 + 2; // i++ evaluates to 1, j++ evaluates to 2
i = i + 1; // i is incremented and becomes 2
j = j + 1; // j is incremented and becomes 3

However...

The exact order in which the expressions i++ and j++ are evaluated, and the order in which their side effects are applied, is unspecified. The following is a perfectly reasonable ordering of operations (using pseudo-assembly code):

mov j, r0        ; read the value of j into register r0
mov i, r1        ; read the value of i into register r1
add r0, r1, r2   ; add the contents of r0 to r1, store result to r2
mov r2, k        ; write result to k
inc r1           ; increment value of i
inc r0           ; increment value of j
mov r0, j        ; store result of j++
mov r1, i        ; store result of i++

DO NOT ASSUME LEFT-TO-RIGHT EVALUATION OF ARITHMETIC EXPRESSIONS. DO NOT ASSUME THAT OPERANDS OF ++ and -- ARE UPDATED IMMEDIATELY AFTER EVALUATION.

Because of this, the result of expressions like i++ + ++i will vary based on the compiler, compiler settings, and even the surrounding code. The behavior is left undefined so that the compiler isn't required to "do the right thing", whatever that may be. You will get a result, but it won't necessarily be the result you expect, and it won't be consistent across all platforms.

Looking at

k = i++ + ++j;

the logical evaluation is

k = 1 + 3  // i++ evaluates to i (1), ++j evaluates to j + 1 (2 + 1 = 3)
i = i + 1
j = j + 1

Again, here's one possible ordering of operations:

mov j, r0
inc r0
mov i, r1
add r0, r1, r2
mov r2, k
mov r0, j
inc r1
mov r1, i

Or it could do something else. The compiler is free to change the order in which individual expressions are evaluated if it leads to a more efficient order of operations (which my examples almost certainly aren't).

Upvotes: 5

P.P
P.P

Reputation: 121347

Except the first group, all expressions in the other three groups have undefined behaviour.

How the defined behviour is evaluated (group 1):

i=1, j=2;

k=i++ + j++; // 1 + 2 = 3
k=i++ + ++j; // 1 + 3 = 4
k=++i + ++j; // 2 + 3 = 5
k=++i + j++; // 2 + 2 = 4

It's fairly straight forward. post-increment vs pre-increment thing.

In group 2 and group 4, it's quite easy to see the undefined behaviours.

Group 2 has undefined behaviour because = operator doesn't introduce a sequence point.

Upvotes: 9

caf
caf

Reputation: 239011

The first group are all defined. They all increment the values of both i and j as a side-effect sometime before the next sequence point, so i is left as 2 and j as 3. In addition, i++ evaluates to 1, ++i evaluates to 2, j++ evaluates to 2 and ++j evaluates to 3. This means that the first assigns 1 + 2 to k, the second assigns 1 + 3 to k, the third assigns 2 + 3 to k and the fourth assigns 2 + 2 to k.

The remainder are all undefined behaviour. In the second and third groups, i is modified twice before a sequence point; in the fourth group i is modified three times before a sequence point.

Upvotes: 2

Keith Thompson
Keith Thompson

Reputation: 263177

There are no sequence points within any of these statements. There are sequence points between them.

If you modify the same object twice between consecutive sequence points (in this case, either via = or via prefix or postfix ++), the behavior is undefined. So the behavior of the first group of 4 statements is well defined; the behavior of the others is undefined.

If the behavior is defined, then i++ yields the previous value of i, and as a side effect modifies i by adding 1 to it. ++i modifies i by adding 1 to it, and then yields the modified value.

Upvotes: 5

Related Questions