Reputation: 406
Consider the following code, which uses a std::atomic
to atomically load a 64-bit object.
#include <atomic>
struct A {
int32_t x, y;
};
A f(std::atomic<A>& a) {
return a.load(std::memory_order_relaxed);
}
With GCC, good things happen, and the following code is generated. (https://godbolt.org/z/zS53ZF)
f(std::atomic<A>&):
mov rax, QWORD PTR [rdi]
ret
This is exactly what I'd expect, since I see no reason why a 64-bit struct shouldn't be able to be treated like any other 64-bit word in this situation.
With Clang, however, the story is different. Clang generates the following. (https://godbolt.org/z/d6uqrP)
f(std::atomic<A>&): # @f(std::atomic<A>&)
push rax
mov rsi, rdi
mov rdx, rsp
mov edi, 8
xor ecx, ecx
call __atomic_load
mov rax, qword ptr [rsp]
pop rcx
ret
mov rdi, rax
call __clang_call_terminate
__clang_call_terminate: # @__clang_call_terminate
push rax
call __cxa_begin_catch
call std::terminate()
This is problematic for me for several reasons:
__atomic_load
, which means that my binary needs to be linked with libatomic. This means I need different lists of libraries to link depending on whether user's of my code use GCC or Clang.The important question on my mind right now is whether there is a way to get Clang to also convert the load into a single instruction. We are using this as part of a library that we plan to distribute to others, so we cannot rely on a particular compiler being used. The solution suggested to me so far is to use type punning and store the struct inside a union alongside a 64-bit int, since Clang does correctly load 64-bit ints atomically in one instruction. I am skeptical of this solution, however, since although it appears to work on all major compilers, I have read that it is in fact undefined behaviour. Such code is also not particularly friendly for others to read and understand if they are not familiar with the trick.
To summarize, is there a way to atomically load a 64-bit struct that:
Upvotes: 5
Views: 753
Reputation: 365267
This clang missed optimization only happens with libstdc++; clang on Godbolt inlines as we expect for -stdlib=libc++
. https://godbolt.org/z/Tt8XTX.
It seems that giving the struct 64-bit alignment is sufficient to hand-hold clang.
libstdc++
's std::atomic
template does that for types that are small enough to be atomic when naturally aligned, but perhaps clang++ is only seeing the alignment of the underlying type, not the class member of atomic<T>
, in the libstdc++ implementation. I haven't investigated; someone should report this to the clang / LLVM bugzilla.
#include <atomic>
#include <stdint.h> // you forgot this header.
struct A {
alignas(std::atomic_int64_t) int32_t x; // same alignment as std::atomic uses for atomic<int64_t>
int32_t y; // this one must be separate, otherwise y would also be aligned -> 16-byte object
};
A f(std::atomic<A>& a) {
return a.load(std::memory_order_relaxed);
}
Aligning the same as std::atomic<int64_t>
should give sufficient alignment on every target where 64-bit object can be lock-free at all. alignof(int64_t)
on a 32-bit ABI might only be 4, and I didn't use alignas(8)
to avoid over-alignment on systems where char is 32-bit and sizeof(int64_t) = 2.
alignas(2*sizeof(int32_t))
would always naturally-align the struct, even on targets where atomic_int64_t isn't that aligned, e.g. because it's not lock-free. That doesn't matter much.
Godbolt; this workaround is still needed with clang13.0
# clang++ 9.0 -std=gnu++17 -O3; g++ is the same
f(std::atomic<A>&):
mov rax, qword ptr [rdi]
ret
BTW, no, the libatomic
library function won't use a lock; it does know that 8-byte aligned loads are naturally atomic and that other use threads will be using plain loads/stores, not locks.
Older clang at least uses call __atomic_load_8
instead of the generic variable-sized one, but that's still a big missed optimization.
Fun fact: clang -m32
will use lock cmpxchg8b
to implement an 8-byte atomic load, instead of using SSE or fild
like GCC does. But that's ABI-compatible with using SSE or x87 so it's not locked-in to using that sub-optimal way. :/
Upvotes: 7