Reputation: 12817

Is there a rule to spot out UB?

I read this great too broad question, and encountered some UB I didn't know before.

The main reason for UB I see from time to time is changing a variable twice between two sequence points. Things like: x = x++ or z = y++ + ++y;. Reading that changing a variable twice between two sequence points is UB helped me see what was the underlying cause in these cases.

But what about things like bit-shift with negatives? (int x = 8 << -1) Is there a rule that can explain that or should I memorize this as a unique UB possibility?

I looked here and under section Integer Overflows I found bit-shift with negatives was written, but I don't understand why they are related. When int is shifted by too much, an overflow is caused, but IMO shifting by a negative is simply UB and the problem isn't the bits that are "over the edge"...

Also looked here ,but that didn't answer my question:

The integer promotions are performed on each of the operands. The type of the result is that of the promoted left operand. If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.

So my questions are:

Specifically, is bit-shift with negatives considered integer overflow and if so, why?
If not, is it a part of a bigger phenomena?
Are there (other) unique cases that can't be grouped under one underlying cause?

Upvotes: 4

Answers (3)

Yunnosch

Reputation: 26703

(Compiling an answer from comments, including mine.)

A good starting point for finding actual undefined behaviour (UB) are these references by Jonathan Leffler:

Yes, there are lots of cases, and grouping them is going to be tricky. Annex J.2 of the C11 standard documents undefined behaviours on pages 557-571 (there are only a few lines on each of the end pages, so it is a bit more than 14 pages).

Reference to a related article which dicsusses types of UB, tools for spotting and contains a list of UB; long and (by intention of the authors) complete (cortesy of davmac):
https://blog.regehr.org/archives/1520

Two approaches for something "memorizable":

by Ajay Brahmakshatriya, focusing on unavoidable platform-dependency:

my general rule of thumb is - anything that would seem to change behavior with different implementations (target, platform, etc) is a red flag to "spot" UB
by Yunnosch, focusing on problems to balance standardising and optimising:

If it would probably be hard to make hardware suppliers agree on this, or would otherwise be hard to define clearly AND allow some room for optimised implementation, then it is probably UB.

Sadly, all of these "rules" are not easy to apply. Checking the actual standard is inconvenient. The two rules of thumb are based on quite some required experience; you either need to have designed a few compilers and or processors, or have suffered a lot from differences between them.

So the actual answer to "Is there an easy way to spot UB?" is probably simply "No."

Upvotes: 1

davmac

Reputation: 20631

Specifically, is bit-shift with negatives considered integer overflow and if so, why?

It is not, because shifting 0 by any amount will never overflow but it is still undefined behaviour to shift a value of 0 by a negative value. (I am assuming that you could consider it integer overflow if you first re-interpret the shift amount as an unsigned integer, at which point it would be large and certainly beyond the allowed range, and an actual shift by that amount if interpreted as a multiplication-by-power-of-2 would certainly overflow if the shifted value was non-zero).

In short, a bit-shift by negative yields undefined behaviour because the language standard says that it does.

If not, is it a part of a bigger phenomena?

John Regehr gives some broad categories of UB in a blog post. Shift by invalid amounts is in the "other UB" category...

Are there (other) unique cases that can't be grouped under one underlying cause?

Yes, see the above post. Among others (these are directly lifted from the blog post):

Pointers that do not point into, or just beyond, the same array object are subtracted (6.5.6).
An object has its stored value accessed other than by an lvalue of an allowable type (6.5)
A nonempty source file does not end in a new-line character which is not immediately preceded by a backslash character or ends in a partial preprocessing token or comment (5.1.1.2)

You could possibly categorise these and the other examples in some way, but it's up to you how you'd want to do that.

In particular, the last example above (about the source file not ending in a new-line) shows just how arbitrary some of the rules are.

Upvotes: 1

supercat

Reputation: 81105

In the case of x<<y with y negative, there are some platforms which will process something like z=x<<y with microcode equivalent to:

unsigned temp = x;
unsigned count=y;
while(count--)
  temp<<=1;
z=temp;

If y is negative, that loop might run a very long time; if it's handled at the microcode level (I think some Transputer chips were that way) it may disable interrupts for many minutes, which could disrupt other aspects of the system.

On most platforms it would cost nothing, outside of contrived scenarios, for a compiler to guarantee that x<<y would not have any side-effects for any values of x or y beyond yielding a possibly-meaningless value; it would in fact be easier for a compiler to generate code without side-effects than to do anything else. Unfortunately, some compiler writers think they should seek out "clever" ways of exploiting the fact that y "can't" be negative, triggering arbitrarily-bad consequences, without regard for whether it's actually useful, perhaps in the mistaken belief that "clever" and "stupid" are antonyms.

Upvotes: 0

Is there a rule to spot out UB?

Answers (3)

Related Questions