BeeOnRope
BeeOnRope

Reputation: 64955

Can a C compiler implement signed right shift "unreasonably"?

The title field isn't long enough to a capture a detailed question, so for the record, my actual question defines "unreasonable" in a specific way:

Is it legal1 for a C implementation have an arithmetic right shift operator that returns different results, over time, for the identical argument values? That is, must >> be a true function?

Imagine you wanted to write portable code using right-shift >> on signed values in C. Unfortunately, for you, the most efficient implementation of some critical part of your algorithm is fastest when signed right shifts fill the sign bit (i.e., they are arithmetic right shifts). Now since this behavior is implementation defined you are kind of screwed if you want to write portable code that takes advantage of it.

Just reading the compiler's documentation (which it is required to provide in the standard, but may or may not actually be easy to access or even exist) is great if you know all the compilers you will run on and will ever run on, but since that is often impossible, you might look for a way to do this portably.

One way I thought of is to simply test the compiler's behavior at runtime: it it appears to implement arithmetic2 right shift, use the optimized algorithm, but if not use a fallback that doesn't rely on it. Of course, just checking that say (short)0xFF00 >> 4 == 0xFFF0 isn't enough since it doesn't exclude that perhaps char or int values work differently, or even the weird case that it fills for some shift amounts or values but not for others3.

So given that a comprehensive approach would be to exhaustively check all shift values and amounts. For all 8-bit char inputs that's only 28 LHS values and 23 RHS values for a total of 211, while short (typically, but let's say int16_t if you want to be pedantic) totals only 220, which could be validated in a fraction of a second on modern hardware4. Doing all 237 32-bit values would take a few seconds on decent hardware, but still possibly reasonable. 64-bit is out for the foreseeable future however.

Let's say you did that and found that the >> behavior exactly matches the desired arithmetic shift behavior. Are you safe to rely on it according to the standard: does the standard constrain the implementation not to change its behavior at runtime? That is, does the behavior have to be expressible as a function of it's inputs, or can (short)0xFF00 >> 4 be 0xFFF0 one moment and then 0x0FF0 (or any other value) the next?

Now this question is mostly theoretical, but violating this probably isn't as crazy as it might seem, given the presence of hybrid architectures such as big.LITTLE that dynamically move processes from on CPU to another with possible small behavioral differences, or live VM migration between say chips manufactured by Intel and AMD with subtle (usually undocumented) instruction behavior differences.


1 By "legal" I mean according to the C standard, not according to the laws of your country, of course.

2 I.e., it replicates the sign bit for newly shifted in bits.

3 That would be kind of insane, but not that insane: x86 has different behavior (overflow flag bit setting) for shifts of 1 rather than other shifts, and ARM masks the shift amount in a surprising way that a simple test probably wouldn't detect.

4 At least anything better than a microcontroller. The inner validation loop is a few simple instructions, so a 1 GHz CPU could validate all ~1 million short values in ~1 ms at one instruction-per-cycle.

Upvotes: 4

Views: 206

Answers (2)

supercat
supercat

Reputation: 81189

The authors of the Standard made no effort to imagine and forbid every unreasonable way implementations might behave. In the days prior to the Standard, implementations almost always defined many behaviors based upon how the underlying platform worked, and the authors of the Standard almost certainly expected that most implementations would do likewise. Since the authors of the Standard didn't want to rule out the possibility that it might sometimes be useful for implementations to behave in other ways (e.g. to support code which targeted other platforms), however, they left many things pretty much wide open.

The Standard is somewhat vague as to the level of precision with which "Implementation Defined" behaviors need to be documented. I think the intention is that a person of reasonable intelligence reading the specification should be able to predict how a piece of code with Implementation-Defined behavior would behave, but I'm not sure that defining an operation as "yielding whatever happens to be in register 7" would be non-conforming.

From a practical perspective, what should matter is whether one is targeting quality implementations for platforms that have been developed within the last quarter century, or whether one is trying to force even deliberately-obtuse compilers to behave in controlled fashion. If the former, it may be worth using a static assert to ensure the -1>>1==-1, but quality compilers for modern hardware where that test passes are going to use arithmetic right shift consistently. While targeting code to obtuse compilers might possibly have some purpose, it's not generally possible to guard against all the ways that a pathological-but-conforming compiler could sabotage one's code, and efforts spent attempting to do so could often be more effectively spent on getting a quality compiler.

Upvotes: 3

John Bollinger
John Bollinger

Reputation: 180361

Let's say you did that and found that the >> behavior exactly matches the desired arithmetic shift behavior. Are you safe to rely on it according to the standard: does the standard constrain the implementation not to change its behavior at runtime? That is, does the behavior have to be expressible as a function of it's inputs, or can (short)0xFF00 >> 4 be 0xFFF0 one moment and then 0x0FF0 (or any other value) the next?

The standard does not place any requirements on the form or nature of the (required) definition of the behavior of right shifting a negative integer. In particular, it does not forbid the definition to be conditional on compile-time or run-time properties beyond the operands. Indeed, this is the case for implementation-defined behavior in general. The standard defines the term simply as

unspecified behavior where each implementation documents how the choice is made.

So, for example, an implementation might provide a macro, a global variable, or a function by which to select between arithmetic and logical shifts. Implementations might also define right-shifting a negative number to do less plausible, or even wildly implausible things.

Testing the behavior is a reasonable approach, but it gets you only a probabilistic answer. Nevertheless, in practice I think it's pretty safe to perform just a small number of tests -- maybe as few as one -- for each LHS data type of interest. It is extremely unlikely that you'll run into an implementation that implements anything other than standard arithmetic (always) or logical (always) right shift, or that you'll encounter one where the choice between arithmetic and logical shifting varies for different operands of the same (promoted) type.

Upvotes: 3

Related Questions