Reputation:
I am using cmpxchg (compare-and-exchange) in i686 architecture for 32 bit compare and swap as follows.
(Editor's note: the original 32-bit example was buggy, but the question isn't about it. I believe this version is safe, and as a bonus compiles correctly for x86-64 as well. Also note that inline asm isn't needed or recommended for this; __atomic_compare_exchange_n
or the older __sync_bool_compare_and_swap
work for int32_t
or int64_t
on i486 and x86-64. But this question is about doing it with inline asm, in case you still want to.)
// note that this function doesn't return the updated oldVal
static int CAS(int *ptr, int oldVal, int newVal)
{
unsigned char ret;
__asm__ __volatile__ (
" lock\n"
" cmpxchgl %[newval], %[mem]\n"
" sete %0\n"
: "=q" (ret), [mem] "+m" (*ptr), "+a" (oldVal)
: [newval]"r" (newVal)
: "memory"); // barrier for compiler reordering around this
return ret; // ZF result, 1 on success else 0
}
What is the equivalent for x86_64 architecture for 64 bit compare and swap
static int CAS(long *ptr, long oldVal, long newVal)
{
unsigned char ret;
// ?
return ret;
}
Upvotes: 5
Views: 14668
Reputation: 340366
The x64 architecture supports a 64-bit compare-exchange using the good, old cmpexch
instruction. Or you could also use the somewhat more complicated cmpexch8b
instruction (from the "AMD64 Architecture Programmer's Manual Volume 1: Application Programming"):
The
CMPXCHG
instruction compares a value in theAL
orrAX
register with the first (destination) operand, and sets the arithmetic flags (ZF
,OF
,SF
,AF
,CF
,PF
) according to the result. If the compared values are equal, the source operand is loaded into the destination operand. If they are not equal, the first operand is loaded into the accumulator.CMPXCHG
can be used to try to intercept a semaphore, i.e. test if its state is free, and if so, load a new value into the semaphore, making its state busy. The test and load are performed atomically, so that concurrent processes or threads which use the semaphore to access a shared object will not conflict.The
CMPXCHG8B
instruction compares the 64-bit values in theEDX:EAX
registers with a 64-bit memory location. If the values are equal, the zero flag (ZF
) is set, and theECX:EBX
value is copied to the memory location. Otherwise, theZF
flag is cleared, and the memory value is copied toEDX:EAX
.The
CMPXCHG16B
instruction compares the 128-bit value in theRDX:RAX
andRCX:RBX
registers with a 128-bit memory location. If the values are equal, the zero flag (ZF
) is set, and theRCX:RBX
value is copied to the memory location. Otherwise, theZF
flag is cleared, and the memory value is copied torDX:rAX
.
Different assembler syntaxes may need to have the length of the operations specified in the instruction mnemonic if the size of the operands can't be inferred. This may be the case for GCC's inline assembler - I don't know.
Upvotes: 1
Reputation: 882146
The x86_64
instruction set has the cmpxchgq
(q
for quadword) instruction for 8-byte (64 bit) compare and swap.
There's also a cmpxchg8b
instruction which will work on 8-byte quantities but it's more complex to set up, needing you to use edx:eax
and ecx:ebx
rather than the more natural 64-bit rax
. The reason this exists almost certainly has to do with the fact Intel needed 64-bit compare-and-swap operations long before x86_64
came along. It still exists in 64-bit mode, but is no longer the only option.
But, as stated, cmpxchgq
is probably the better option for 64-bit code.
If you need to cmpxchg a 16 byte object, the 64-bit version of cmpxchg8b
is cmpxchg16b
. It was missing from the very earliest AMD64 CPUs, so compilers won't generate it for std::atomic::compare_exchange on 16B objects unless you enable -mcx16
(for gcc). Assemblers will assemble it, though, but beware that your binary won't run on the earliest K8 CPUs. (This only applies to cmpxchg16b
, not to cmpxchg8b
in 64-bit mode, or to cmpxchgq
).
Upvotes: 7
Reputation: 149
usage of cmpxchg8B from AMD64 Architecture Programmer's Manual V3:
Compare EDX:EAX register to 64-bit memory location. If equal, set the zero flag (ZF) to 1 and copy the ECX:EBX register to the memory location. Otherwise, copy the memory location to EDX:EAX and clear the zero flag.
I use cmpxchg8B to implement a simple mutex lock function in x86-64 machine. here is the code
.text
.align 8
.global mutex_lock
mutex_lock:
pushq %rbp
movq %rsp, %rbp
jmp .L1
.L1:
movl $0, %edx
movl $0, %eax
movl $0, %ecx
movl $1, %ebx
lock cmpxchg8B (%rdi)
jne .L1
popq %rbp
ret
Upvotes: -1
Reputation: 31928
cmpxchg8b
__forceinline int64_t interlockedCompareExchange(volatile int64_t & v,int64_t exValue,int64_t cmpValue)
{
__asm {
mov esi,v
mov ebx,dword ptr exValue
mov ecx,dword ptr exValue + 4
mov eax,dword ptr cmpValue
mov edx,dword ptr cmpValue + 4
lock cmpxchg8b qword ptr [esi]
}
}
Upvotes: 2