Reputation: 71
I have following code with GCC8.3 ,x86-64 linux:
// file: inc.cc
int inc_value(int* x) {
(*x)++;
//std::atomic<int> ww;
//ww.load(std::memory_order_relaxed);
(*x)++;
return *x;
}
generates following assembly for increment operation:
g++ -S inc.cc -o inc.s -O3
addl $2, %eax
afther I enable the use of atomic variable, I got:
addl $1, (%rdi)
movl -4(%rsp), %eax
movl (%rdi), %eax
addl $1, %eax
movl %eax, (%rdi)
It seems that the Relaxed-atomic-load works like a Compiler-fence (just like asm volatile("":::"memory");), so GCC cannot reorder the instructions around it;
I have known:
According above, GCC should reorder+optimize the code to add $2, %eax, just like relaxed-operation takes no effect; But the result shows that GCC takes the relaxed-load as a compiler-fence, stopps any reordering; So I have following question:
Upvotes: 5
Views: 110
Reputation: 57922
for x86-64, does GCC always generate a full compiler-fence for atomic operation? even though it's a relaxed operation
No, it does not always do so, and there is no reason to expect that it should.
Here is a similar example:
#include <atomic>
std::atomic<int> x;
int foo(int& y) {
int z = y;
x.load(std::memory_order_relaxed);
return y+z;
}
The generated asm (godbolt) all the way back to GCC 6.1 (earliest version on Godbolt supporting <atomic>
) is:
movl (%rdi), %eax
movl x(%rip), %edx
addl %eax, %eax
ret
Notice that y
is not reloaded after the load of x
; the compiler assumes (as it is allowed to do under the C++ memory model) that y
has not changed, so it can reuse the value in %eax
.
I think you simply stumbled on an example where GCC misses an optimization that it could have performed. It's true that compilers often don't optimize atomics and surrounding operations as aggressively as the memory model would allow, because they think it makes for a better quality of implementation. So I can't say whether the example you found represents a missed optimization bug, or deliberate behavior intended by the GCC developers.
If so, for x86-64, is it enough to use only 2 orders: relaxed-order and seq_cst?
The above certainly suggests not.
It is not quite an example, because the optimization performed would actually still be valid if the load were acquire. Since we already read from y
both before and after the load, if there were a concurrent write to y
, it would be a data race and the behavior would be undefined. So the compiler is still justified in assuming that there is no concurrent write and the reload is unnecessary.
I haven't yet been able to find an example where GCC optimizes around a relaxed operation in a way that would be forbidden for acquire/release, but I strongly suspect that they exist, or will exist in the future.
Upvotes: 1