innochenti
innochenti

Reputation: 1103

Is this undefined behaviour and why?

Please, explain why this code is correct or why not: In my opinion, line ++*p1 = *p2++ has undefined behaviour, because p1 is dereferenced first and then incrementing.

int main()
{
   char a[] = "Hello";
   char b[] = "World";

   char* p1 = a;
   char* p2 = b;

   //*++p1 = *p2++; // is this OK?
   ++*p1 = *p2++; // is this OK? Or this is UB?

   std::cout << a << "\n" << b;

   return 0;
}

Upvotes: 2

Views: 254

Answers (2)

Serge Dundich
Serge Dundich

Reputation: 4429

In modern C++ (at least C++ 2011 and later) neither is undefined behavior. And even neither is implementation defined or unspecified. (All three terms are different things.)

These two lines are both well defined (but they do different things). When you have pointers p1 and p2 to scalar types then

*++p1 = *p2++;

is equivalent to

p1 = p1 + 1;
*p1 = *p2;
p2 = p2 + 1;

(^^^this is also true for C++ 1998/2003)

and

++*p1 = *p2++;

is equivalent to

*p1 = *p1 + 1;
*p1 = *p2;
p2 = p2 + 1;

(^^^maybe also in C++ 1998/2003 or maybe not - as explained below)

Obviously in case 2 incrementing value and then assigning to it (thus overwriting just incremented value) is pointless - but there may be similar examples that make sense (e.g. += instead of =).

BUT like many people point out - just don't write the code that looks ambiguous or unreasonably complex. Write the code that is clear to you and supposed to be clear to the readers.

Old C++ 1998/2003 case for second expression is a strange matter:

At first after reading the description of prefix increment operator:

ISO/IEC 14882-2003 5.3.2:

  1. The operand of prefix ++ is modified by adding 1, or set to true if it is bool (this use is deprecated). The operand shall be a modifiable lvalue. The type of the operand shall be an arithmetic type or a pointer to a completely-defined object type. The value is the new value of the operand; it is an lvalue. If x is not of type bool, the expression ++x is equivalent to x+=1.

I personally have a strong feeling that everything is perfectly defined and obvious and the same as above for C++ 2011 and later. At least in the sense that every reasonable C++ implementation will behave in exact same well defined way (including old ones).

Why it should be otherwise if we always intuitively rely on a general rule that in any simple operator evaluation within a complex expression we evaluate its operands first and after that apply the operator to the values of those operands. Right? Breaking this intuitive expectation would be extremely stupid for any programming language.

So for the full expression ++*p1 = *p2++; we have operands: 1 - ++*p1 evaluated as already incremented lvalue (as defined in the above quote from C++ 2003) and 2 - *p2++ that is an rvalue stored at pointer p2 before its increment. It doesn't look ambiguous at all. Of course in this case - no reason to increment a value you are overwriting anyway BUT if there was double increment instead - ++(++*p1); OR other kind of assignment like +=/-=/&=/*=/etc instead of simple assignment THAT would not be unreasonable at all.

Unfortunately all the intuition and logic is messed up by this:

ISO/IEC 14882-2003 - 5 Expressions:

  1. Except where noted, the order of evaluation of operands of individual operators and subexpressions of individual expressions, and the order in which side effects take place, is unspecified. Between the previous and next sequence point a scalar object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be accessed only to determine the value to be stored. The requirements of this paragraph shall be met for each allowable ordering of the subexpressions of a full expression; otherwise the behavior is undefined.
  [Example:
      i = v[i++];         // the behavior is unspecified
      i = 7, i++, i++;    // i becomes 9
      i = ++i + 1;        // the behavior is unspecified
      i = i + 1;          // the value of i is incremented
  —end example]

So this wording if interpreted in a paranoid way seems to imply that modification of a value stored in a specific location more than once without intervening sequence point is explicitly forbidden by this rule and the last sentence declares that failing to comply with every requirement is Undefined Behavior. AND our expression seems to modify the same location more that once (?) with no sequence point until the full expression evaluated. (This arbitrary and unreasonable limitation is reinforced further by example 3 - i = ++i + 1; though it says // the behavior is unspecified - not undefined as in the wording before - which only adds more confusion.)

BUT on the other hand... If we ignore the example 3. (Maybe i = ++i + 1; is a typo and there should have been postfix increment instead - i = i++ + 1;? Who knows... Anyway examples are not part of formal specification.) If we interpret this wording in the most permissive way - we can see that in each allowed order of evaluation of subexpressions of the whole expression - preincrement ++*p1 must be evaluated to an LVALUE (which is something that allows further modification) BEFORE applying assignment operator so the only valid final value at that location is the one that is stored with assignment operator. ALSO NOTE that conforming C++ implementation have no obligation to actually modify that location more than once and may instead store only final result - that is both reasonable optimization allowed by the standard and may be actual demand of this article.

Which one of those interpretations is correct? Paranoid or permissive? Universally applicable logic or some suspicious and ambiguous words in a document almost nobody really ever read? Blue pill or Red pill?

Who knows... It looks like a gray area that requires less ambiguous explanation.

If we interpret the quote from C++ 2003 standard above in a paranoid way then it looks like this code may be Undefined Behavior:

#include <iostream>

#define INC(x) (++(x))

int main()
{
    int a = 5;

    INC(INC(a));
    std::cout << a;
    return 0;
}

while this code is perfectly legitimate and well defined:

#include <iostream>

template<class T> T& INC(T& x) // sequence point after evaluation of the arguments
{                              // and before execution of the function body
    return ++x;
}

int main()
{
    int a = 5;

    INC(INC(a));
    std::cout << a;
    return 0;
}

Really?

All this looks very much like a defect of the old C++ standard.

Fortunately this has been addressed in newer C++ standards (starting with C++ 2011) as there is no such concept as sequence point anymore. Instead there is a relation - something sequenced before something. And of course the natural guarantee that evaluation of the argument expressions of any operator is sequenced before evaluation of the result of the operator is there.

ISO/IEC 14882-2011 - 1.9 Program execution

  1. Sequenced before is an asymmetric, transitive, pair-wise relation between evaluations executed by a single thread (1.10), which induces a partial order among those evaluations. Given any two evaluations A and B, if A is sequenced before B, then the execution of A shall precede the execution of B. If A is not sequenced before B and B is not sequenced before A, then A and B are unsequenced. [ Note: The execution of unsequenced evaluations can overlap. — end note ] Evaluations A and B are indeterminately sequenced when either A is sequenced before B or B is sequenced before A, but it is unspecified which. [ Note: Indeterminately sequenced evaluations cannot overlap, but either could be executed first. — end note ]

  2. Every value computation and side effect associated with a full-expression is sequenced before every value computation and side effect associated with the next full-expression to be evaluated.

  3. Except where noted, evaluations of operands of individual operators and of subexpressions of individual expressions are unsequenced. [ Note: In an expression that is evaluated more than once during the execution of a program, unsequenced and indeterminately sequenced evaluations of its subexpressions need not be performed consistently in different evaluations. — end note ] The value computations of the operands of an operator are sequenced before the value computation of the result of the operator. If a side effect on a scalar object is unsequenced relative to either anotherside effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined.

  [ Example:
      void f(int, int);
      void g(int i, int* v) {
      i = v[i++];         // the behavior is undefined
      i = 7, i++, i++;    // i becomes 9
      i = i++ + 1;        // the behavior is undefined
      i = i + 1;          // the value of i is incremented
      f(i = -1, i = -1);  // the behavior is undefined
  }
  — end example ]

(Also NOTE how C++ 2003 prefix increment example i = ++i + 1; is replaced by postfix increment example i = i++ + 1; in this C++ 2011 quote. :) )

Upvotes: 1

6502
6502

Reputation: 114461

The first is ok

*++p1 = *p2++ // p1++; *p1 = *p2; p2++;

the second is UB with C++ because you are modifying what is pointed by p1 twice (once because of increment and once because of assignment) and there are no sequence points separating the two side effects.

With C++0x rules things are different and more complex to explain and to understand. If you write intentionally expressions like the second one, if it's not for a code golf competition and if you are working for me then consider yourself fired (even if that is legal in C++0x).

I don't know if it is legal in C++0x and I don't want to know. I've too few neurons to waste them this way.

Upvotes: 10

Related Questions