BeeOnRope
BeeOnRope

Reputation: 64895

Is it legal to optimize away stores/construction of volatile stack variables?

I noticed that clang and gcc optimize away the construction of or assignment to a volatile struct declared on the stack, in some scenarios. For example, the following code:

struct nonvol2 {
    uint32_t a, b;
};

void volatile_struct2()
{
    volatile nonvol2 temp = {1, 2};
}

Compiles on clang to:

volatile_struct2(): # @volatile_struct2()
  ret

On the other hand, gcc does not remove the stores, although it does optimize the two implied stores into a single one:

volatile_struct2():
        movabs  rax, 8589934593
        mov     QWORD PTR [rsp-8], rax
        ret

Oddly, clang won't optimize away a volatile store to a single int variable:

void volatile_int() {
    volatile int x = 42;
}

Compiles to:

volatile_int(): # @volatile_int()
  mov dword ptr [rsp - 4], 1
  ret

Furthermore a struct with 1 member rather than 2 is not optimized away.

Although gcc doesn't remove the construction in this particular case, it does perhaps even more aggressive optimizations in the case that the struct members themselves are declared volatile, rather than the struct itself at the point of construction:

typedef struct {
    volatile uint32_t a, b;
} vol2;

void volatile_def2()
{
    vol2 temp = {1, 2};
    vol2 temp2 = {1, 2};
    temp.a = temp2.a;
    temp.a = temp2.a;
}

simply compiles down to a simple ret.

While it seems entirely "reasonable" to remove these stores which are pretty much impossible to observe by any reasonable process, my impression was that in the standard volatile loads and stores are assumed to be part of the observable behavior of the program (in addition to calls to IO functions), full stop. The implication being they are not subject to removal by "as if", since it would by definition change the observable behavior of the program.

Am I wrong about that, or is clang breaking the rules here? Perhaps construction is excluded from the cases where volatile must be assumed to have side effects?

Upvotes: 9

Views: 375

Answers (2)

supercat
supercat

Reputation: 81115

From the point of view of the Standard, there is no requirement that implementations document anything about how any objects are physically stored in memory. Even if an implementation documents the behavior of using pointers of type unsigned char* to access objects of a certain type, an implementation would be allowed to physically store data some other way and then have the code for character-based reads and writes adjust behaviors suitably.

If an execution platform specifies a relationship between abstract-machine objects and storage seen by the CPU, and defines ways by which accesses to certain CPU addresses might trigger side effects the compiler doesn't know about, a quality compiler suitable for low-level programming on that platform should generate code where the behavior of volatile-qualified objects is consistent with that specification. The Standard makes no attempt to mandate that all implementations be suitable for low-level programming (or any other particular purpose, for that matter).

If the address of an automatic variable is never exposed to outside code, a volatile qualifier need only have only two effects:

  1. If setjmp is called within a function, a compiler must do whatever is necessary to ensure that longjmp will not disrupt the values of any volatile-qualified objects, even if they were written between the setjmp and longjmp. Absent the qualifier, the value of objects written between setjmp and longjmp would become indeterminate when a longjmp is executed.

  2. Rules which would allow a compiler to presume that any loops which don't have side effects will run to completion do not apply in cases where a volatile object is accessed within the loop, whether or not an implementation would define any means by which such access would be observable.

Except in those cases, the as-if rule would allow a compiler to implement the volatile qualifier in the abstract machine in a way that has no relation to the physical machine.

Upvotes: 4

Nicol Bolas
Nicol Bolas

Reputation: 473212

Let us investigate what the standard directly says. The behavior of volatile is defined by a pair of statements. [intro.execution]/7:

The least requirements on a conforming implementation are:

  • Accesses through volatile glvalues are evaluated strictly according to the rules of the abstract machine.

...

And [intro.execution]/14:

Reading an object designated by a volatile glvalue (6.10), modifying an object, calling a library I/O function, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment.

Well, [intro.execution]/14 does not apply because nothing in the above code constitutes "reading an object". You initialize it and destroy it; it is never read.

So that leaves [intro.execution]/7. The phrase of importance here is "accesses through volatile glvalues". While temp certainly is a volatile value, and it certainly is a glvalue... you never actually access through it. Oh yes, you initialize the object, but that doesn't actually access "though" temp as a glvalue.

That is, temp as an expression is a glvalue, per the definition of glvalue: "an expression whose evaluation determines the identity of an object, bit-field, or function." The statement creating and initializing temp results in a glvalue, but the initialization of temp isn't accessing through a glvalue.

Think of volatile like const. The rules about const objects don't apply until after it is initialized. Similarly, the rules about volatile objects don't apply until after it is initialized.

So there's a difference between volatile nonvol2 temp = {1, 2}; and volatile nonvol2 temp; temp.a = 1; temp.b = 2;. And Clang certainly does the right thing in that case.

That being said, the inconsistency of Clang with regard to this behavior (optimizing it out only when using a struct, and only when using a struct that contains more than one member) suggests that this is probably not a formal optimization by the writers of Clang. That is, they're not taking advantage of the wording so much as this just being an odd quirk of some accidental code coming together.


Although gcc doesn't remove the construction in this particular case, it does perhaps even more aggressive optimizations in the case that the struct members themselves are declared volatile, rather than the struct itself at the point of construction:

GCC's behavior here is:

  1. Not in accord with the standard, as it is in violation of [intro.execution]/7, but
  2. There's absolutely no way to prove that it isn't compliant with the standard.

Given the code you wrote, there is simply no way for a user to detect whether or not those reads and writes are actually happening. And I rather suspect that the moment you do anything to allow the outside world to see it, those changes will suddenly appear in the compiled code. However much the standard wishes to call it "observable behavior", the fact is that by C++'s own memory model, nobody can see it.

GCC gets away with the crime due to lack of witnesses. Or at least credible witnesses (anyone who could see it would be guilty of invoking UB).

So you should not treat volatile like some optimization off-switch.

Upvotes: 5

Related Questions