song xs
song xs

Reputation: 71

Does gcc treat relaxed atomic operation as a Compiler-fence?

I have following code with GCC8.3 ,x86-64 linux:

// file: inc.cc
int inc_value(int* x) {
  (*x)++;
  //std::atomic<int> ww;
  //ww.load(std::memory_order_relaxed);
  (*x)++;
  return *x;
}

generates following assembly for increment operation:

 g++ -S inc.cc -o inc.s -O3
 addl  $2, %eax  

afther I enable the use of atomic variable, I got:

  addl  $1, (%rdi)  
  movl  -4(%rsp), %eax
  movl  (%rdi), %eax
  addl  $1, %eax
  movl  %eax, (%rdi)

It seems that the Relaxed-atomic-load works like a Compiler-fence (just like asm volatile("":::"memory");), so GCC cannot reorder the instructions around it;

I have known:

  1. cppreference says no memory-order guarantee around relaxed-operation, so it's ok for compiler to reorder
  2. X86-64 has a strong TSO memory model; Atomic operations doesnt generate lock/xchg/mfence cpu-fence instructions, only work as a compiler-fence (except for seq_cst);

According above, GCC should reorder+optimize the code to add $2, %eax, just like relaxed-operation takes no effect; But the result shows that GCC takes the relaxed-load as a compiler-fence, stopps any reordering; So I have following question:

  1. for x86-64, does GCC always generate a full compiler-fence for atomic operation? even though it's a relaxed operation; Besides, GCC generates mov-to-memory instead of mov-to-register instruction(doesnt cache the tmp value in register), does the 'atomic-compiler-fence' also implies memory side effect to GCC so GCC has to store/load values from memory around the fence?
  2. If so, for x86-64, is it enough to use only 2 orders: relaxed-order and seq_cst? Since x86-64 has TSO guarantees, and relaxed-order is taken as a full compiler-fence, it can replace the usage of release/acquire/consume.

Upvotes: 5

Views: 110

Answers (1)

Nate Eldredge
Nate Eldredge

Reputation: 57922

for x86-64, does GCC always generate a full compiler-fence for atomic operation? even though it's a relaxed operation

No, it does not always do so, and there is no reason to expect that it should.

Here is a similar example:

#include <atomic>
std::atomic<int> x;
int foo(int& y) {
    int z = y;
    x.load(std::memory_order_relaxed);
    return y+z;
}

The generated asm (godbolt) all the way back to GCC 6.1 (earliest version on Godbolt supporting <atomic>) is:

  movl (%rdi), %eax
  movl x(%rip), %edx
  addl %eax, %eax
  ret

Notice that y is not reloaded after the load of x; the compiler assumes (as it is allowed to do under the C++ memory model) that y has not changed, so it can reuse the value in %eax.

I think you simply stumbled on an example where GCC misses an optimization that it could have performed. It's true that compilers often don't optimize atomics and surrounding operations as aggressively as the memory model would allow, because they think it makes for a better quality of implementation. So I can't say whether the example you found represents a missed optimization bug, or deliberate behavior intended by the GCC developers.

If so, for x86-64, is it enough to use only 2 orders: relaxed-order and seq_cst?

The above certainly suggests not.

It is not quite an example, because the optimization performed would actually still be valid if the load were acquire. Since we already read from y both before and after the load, if there were a concurrent write to y, it would be a data race and the behavior would be undefined. So the compiler is still justified in assuming that there is no concurrent write and the reload is unnecessary.

I haven't yet been able to find an example where GCC optimizes around a relaxed operation in a way that would be forbidden for acquire/release, but I strongly suspect that they exist, or will exist in the future.

Upvotes: 1

Related Questions