nawfel bgh
nawfel bgh

Reputation: 1552

Why does (*p=*p) & (*q=*q); in C trigger undefined behavior

Why does (*p=*p) & (*q=*q); in C trigger undefined behavior if p and q are equal.

int f2(int * p, int * q)
{
  (*p=*p) & (*q=*q);
  *p = 1;
  *q = 2;
  return *p + *q;
}

Source (Nice article by the way): http://blog.frama-c.com/index.php?post/2012/07/25/On-the-redundancy-of-C99-s-restrict

Upvotes: 6

Views: 2236

Answers (5)

supercat
supercat

Reputation: 81217

When the C standard was written, if the effect of a certain action would vary on different platforms, it would not always be possible for a particular platform to guarantee any particular precise effect, and if there might plausibly exist implementations where the action could trigger a hardware trap whose behavior was outside the control of the C compiler, there was little perceived value in having the Standard say anything about the behavior. Even if there wasn't any significant likelihood of a hardware trap, the possibility of "surprising" behavior was sufficient to brand behavior as Undefined.

Consider for example, unsigned long x,*p; ... *p=(x++);. If p==&x, it would not only be possible that *p might end up holding not only the old value of x, or a value 1 greater, but if x was e.g. 0x0000FFFF it might also plausibly end up holding 0x00000000, or 0x0001FFFF. Even if no machine would trigger a hardware trap, I don't think the Standard's authors would have considered "Any lvalue modified more than once will hold Indeterminate Value, and any read of an lvalue in the same expression that writes it in a manner other than allowed herein may yield Indeterminate Value" to be in any way more useful than simply declaring such actions to be Undefined Behavior. Further, from the point of view of the Standard's authors, the failure of the Standard to mandate particular behaviors in cases where some platforms could provide free of charge and others could not was not expected to pose an obstacle to the specification of such behaviors on platforms that could provide them.

In practice, even very loosely-specified behaviors can often be very useful for programs which share the following two requirements with the vast majority of programs written today:

  1. When given valid input, produce correct output.
  2. When given invalid input, do not launch nuclear missiles.

Unfortunately, someone came up with the idea that if the C Standard does not mandate the behavior of some action X in a particular situation Y, even if most compilers happen to have behavior which would be adequate for programs seeking to meet the above requirements (e.g. most compilers will generate for the expression p < q code that will either yield 0 or 1 and have no other side-effects, even when p and q identify unrelated objects), then action X should be regarded as an indication to the compiler that the program will never receive any input which would cause situation Y.

The indicated (*p=*p) & (*q=*q) is intended to represent such a "promise". The logic is that since the Standard wouldn't say anything about what a compiler can do if p==q, a compiler should assume that the programmer wouldn't mind if the program would launch nuclear missiles in response to any input that could cause the code to be executed when p==q.

That idea and its consequences are fundamentally antithetical to the very nature and design objectives of C and its use a systems-programming language. Nearly all systems offer some features and guarantees beyond those mandated by the Standard, though the specifics vary from one system to the next. I consider preposterous the idea that the language is better served by redefining x < y from "I'm willing to accept whatever means of pointer comparison is used by any hardware on which this program is actually going to be run" to "I'm so certain that these two pointers will be related that I would stake my life on it", than it would be by adding a new means of directing the compiler to assume that "x and y are related pointers", but somehow it seems to be becoming accepted.

Upvotes: 2

Paul Ogilvie
Paul Ogilvie

Reputation: 25286

The question of this thread starts with "Why does (*p=*p) & (*q=*q); in C trigger undefined behavior if p and q are equal?" and the questioneer refers to an article that reasons that the new restrict keyword in C (and C++?) are unnecessary because we can tell the compiler this by writing an expression (*p=*p) & (*q=*q);.

The explanation of this expression by user Iwillnotexist Idonotexist is very thorough...and very complex. Basically, the conclusion is that this is rather a directive than a statement since the expression yields no result that is used and only has side-effects (assignment to self) that have no effects (self remains unchanged, even if p==q), so any good compiler may optimize it out.

Still not grasping completely the explanation, I opt for that new keyword and not write the wrong expression.

Upvotes: 1

M.M
M.M

Reputation: 141628

If *p and *q designate the same memory location, then writing to them both without an intervening sequence point (or sequence relation in C11) causes undefined behaviour.

= and & do not introduce sequence points.

The code is equivalent to int i = 0; (i=i) & (i=i); which has UB for the same reason. Another similar example would be (*p = 1) & (*q = 2).

Upvotes: 4

rici
rici

Reputation: 241861

In simple terms, (*p = *p) & (*q = *q) is undefined if p and q have the same value because:

  • You can't mutate the same location twice in an unsequenced evaluation; and
  • You can't read from a location which is being mutated in the same unsequenced evaluation.

This is undefined behaviour in both C and C++, although the standard wordings are slightly different (and the above text doesn't correspond to either standard precisely; it was intended as an simplified explanation. I'm sure you can find precise texts on SO.)

The & operator is a simple bitwise and, so it does not impose any evaluation order. It may seem like *p = *p is an obvious no-op, but there is no guarantee that it is implemented that way. A compiler may (for example) implement that as tmp = *p; *p = 0; *p += tmp. It may also not be able to set all the bits of *p at once, requiring that the assignment be done piecemeal.


Now, a little personal bugbear. The expression <something> "triggers undefined behaviour" makes it sound like there is some category of behaviour called "undefined behaviour", perhaps a kind of big red button which will start firing nasal demons in all directions when pressed. That's not a good model for what is happening. It is better to say "the behaviour of <something> is undefined".

Be aware than the behaviour of an entire program is undefined if any part of the program which is executed has undefined behaviour. The entire program, not the part of the program starting with the part with undefined behaviour.


Finally -- and this is the point of the linked article -- the compiler is allowed to assume that the behaviour of the program is defined. Consequently, if the program includes an expression like (*p = *p) & (*q = *q), then the compiler can assume that p and q point to different non-overlapping objects. And once it makes that assumption, it may be able to better optimize expressions involving both *p and *q. It is also likely that once the compiler has made that assumption, then it can eliminate the entire computation of (*p = *p) & (*q = *q), because intermediate values of *p and *q (if there are any) are not observable if p and q are distinct. So you can think of that expression as a kind of declaration: you are promising the compiler that you have done whatever is necessary to guarantee that p and q point to different non-overlapping objects. (The compiler will not, and probably cannot, verify your claim. It will just take your word for it.)

The author then argues that this idiom is more powerful than the (somewhat controversial) restrict keyword. I have no doubt that it is, and it is probably possible to construct expressions like that to cover a number of restrictions which cannot be easily expressed with restrict. So to that extent, it seems an interesting idea. On the other hand, the precise expression is, to say the least, obscure and easy to get wrong.

Upvotes: 2

Iwillnotexist Idonotexist
Iwillnotexist Idonotexist

Reputation: 13457

The ruling of the C11 Standard on the statement

(*p=*p) & (*q=*q);

is:

P1

§6.5p3

The grouping of operators and operands is indicated by the syntax. 85) Except as specified later, side effects and value computations of subexpressions are unsequenced.

Since §6.5.10 Bitwise AND operator fails to mention sequencing of its operands, it follows that (*p=*p) and (*q=*q) are unsequenced.

P2

§6.5p2

If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined. If there are multiple allowable orderings of the subexpressions of an expression, the behavior is undefined if such an unsequenced side effect occurs in any of the orderings. 84)

Both assignments (*p=*p) and (*q=*q) are unsequenced w.r.t. each other by §6.5p3, and have a side-effect on the same object if p==q. Therefore, if p==q, then by §6.5p2 we have UB.

P3

§3.4.3

undefined behaviour

behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements.

By this clause we know that the standard imposes no requirements on UB. This is commonly interpreted by compilers as a license to ignore the possibility that such behaviour occurs.

In particular, it allows the compiler to not handle the case p == q, which means that it may assume that p != q.

P1+P2+P3 -> C1

Because (*p=*p) and (*q=*q) may be assumed by the combined premises P1, P2 and P3 not to invoke UB, they may also be assumed to be loads and stores to different memory locations. This also means that the return value of f2 must be 3 and not 4. If p == q, the Standard imposes no requirements on what occurs.

Upvotes: 4

Related Questions