Reputation: 1
I am working on a nice tool, which requires the atomic swap of two different 64-bit values. On the amd64 architecture it is possible with the XCHGQ
instruction (see here in doc, warning: it is a long pdf).
Correspondingly, gcc has some atomic builtins which would ideally do the same, as it is visible for example here.
Using these 2 docs I produced the following simple C function, for the atomic swapping of two, 64-bit values:
void theExchange(u64* a, u64* b) {
__atomic_exchange(a, b, b, __ATOMIC_SEQ_CST);
};
(Btw, it wasn't really clear to me, why needs an "atomic exchange" 3 operands.)
It was to me a little bit fishy, that the gcc __atomic_exchange
macro uses 3 operands, so I tested its asm output. I compiled this with a gcc -O6 -masm=intel -S
and I've got the following output:
.LHOTB0:
.p2align 4,,15
.globl theExchange
.type theExchange, @function
theExchange:
.LFB16:
.cfi_startproc
mov rax, QWORD PTR [rsi]
xchg rax, QWORD PTR [rdi] /* WTF? */
mov QWORD PTR [rsi], rax
ret
.cfi_endproc
.LFE16:
.size theExchange, .-theExchange
.section .text.unlikely
As we can see, the result function contains not only a single data move, but three different data movements. Thus, as I understood this asm code, this function won't be really atomic.
How is it possible? Maybe I misunderstood some of the docs? I admit, the gcc builtin doc wasn't really clear to me.
Upvotes: 0
Views: 2123
Reputation: 58762
This is the generic version of __atomic_exchange_n (type *ptr, type val, int memorder)
where only the exchange operation on ptr
is atomic, the reading of val
is not. In the generic version, val
is accessed via pointer, but the atomicity still does not apply to it. The pointer is so that it will work with multiple sizes, when the compiler has to call an external helper:
The four non-arithmetic functions (load, store, exchange, and compare_exchange) all have a generic version as well. This generic version works on any data type. It uses the lock-free built-in function if the specific data type size makes that possible; otherwise, an external call is left to be resolved at run time. This external call is the same format with the addition of a ‘size_t’ parameter inserted as the first parameter indicating the size of the object being pointed to. All objects must be the same size.
Upvotes: 2