user4751448
user4751448

Reputation: 11

inline assembly, spinlock using cmpxchg x86 (amd 64)

This is the first question for me here. I'm implementing spinlock using cmpxchg instruction. There is no lock change after cmpxchg instruction. In that, there is something wrong with my assembly code.

This is my thread function code.

void *t_function(void *data){
 volatile int* t_cnt;

 t_cnt = (int *)data;

 /* Lock */
 while(CompareAndSet(&lock, 0, 1));

 /* Critical Section */
 for(; i<TEST_NUM; i++){
  (*t_cnt)++;
 }

 /* Unlock */
 lock = 0;
}

This is my CompareAndSet function with inline assembly code.

unsigned int CompareAndSet(unsigned in *target, unsigned int expected, unsigned int update){
 unsigned int rv = *target;

 printf("lock : %d, expected : %d, update : %d\n", *target, expected, update);

 __asm__ __volatile__(
 "cmpxchg %0, %2\n\t"
 :"+r" (*target), "+a" (expected)
 :"r" (update)
 );

 printf("lock_after : %d\n", *target);
 return rv;
}

When I compile and run this, I got this. It seems that lock doesn't change.

... lock : 0, expected : 0, update : 1 lock_after : 0 Thread [19] created. lock : 0, expected : 0, update : 1 lock_after : 0 Thread [20] created. lock : 0, expected : 0, update : 1 lock_after : 0 lock : 0, expected : 0, update : 1 lock_after : 0 Thread [21] created. Thread [22] created. lock : 0, expected : 0, update : 1 lock_after : 0 Thread [23] created. lock : 0, expected : 0, update : 1 lock_after : 0 Thread [24] created. lock : 0, expected : 0, update : 1 lock_after : 0 Thread [25] created. lock : 0, expected : 0, update : 1 lock_after : 0 Thread [26] created. Thread [27] created. Thread [28] created. lock : 0, expected : 0, update : 1 lock_after : 0 Thread [29] created. lock : 0, expected : 0, update : 1 lock_after : 0 Thread [30] created. lock : 0, expected : 0, update : 1 lock_after : 0 Thread [31] created. lock : 0, expected : 0, update : 1 lock_after : 0 lock : 0, expected : 0, update : 1 lock_after : 0 lock : 0, expected : 0, update : 1 lock_after : 0 cnt : 9478079

#define THREAD_NUM 32 #define TEST_NUM 10000000

Does anyone have idea with my assembly code?

From 'AMD64 Architecture Programmer’s Manual Volume 3: General Purpose and System Instructions PDF', I got how to use cmpxchg instruction. It says,

Mnemonic :
CMPXCHG reg/mem32, reg32

Opcode : OF B1 /r

Description : Compare EAX register with a 32-bit register or memory location. If equal, copy the second operand to the first operand. Otherwise, copy the first operand to EAX.

Upvotes: 1

Views: 1248

Answers (1)

Levon Minasian
Levon Minasian

Reputation: 581

  1. Your inline asm is supposed to be at&t syntax by default. Keep in mind, it has reversed argument order(src, dst). We'll move on with at&t syntax.

  2. As @Jester mentioned, there is no meaning in using spinlock if it's not in the memory. So you should change constraint to "+m"(*target).

  3. Now, cmpxchg instruction does not atomically swap values. To do the swap atomically, you should add lock prefix, like that:

     lock cmpxchg %2, %0;
    

    (See arguments order changed because of at&t syntax).

  4. You were missing a "memory" clobber to stop compile-time reordering with other statements.

  5. Your return value was an earlier non-atomic load, so it could match expected in cases where the CAS failed, or vice versa. You need to return the updated value of expected, and/or a bool from old_expected == new_expected or from FLAGS.

Summing up

int cmpxchg(int* val, int expected, int newval) {
    __asm__ volatile(
        "lock cmpxchg %2, %0\n\t"
        : "+m" (*val), "+a" (expected)
        : "r" (newval)
        : "rax", "memory"
    );
    return expected;
}

Big thanks to @PeterCordes for essential corrections. See also his answer on c inline assembly getting "operand size mismatch" when using cmpxchg for another version of this wrapper that compiles with or without -masm=intel, using dialect-alternatives. And a version that uses the ZF output of lock cmpxchg.


But generally you shouldn't use inline assembly for this at all (https://gcc.gnu.org/wiki/DontUseInlineAsm), except as a learning exercise in inline asm, unless you're writing a freestanding program / kernel and want to follow the Linux kernel's style. Use the GNU C __atomic builtins instead:

  __atomic_compare_exchange_n(val, &expected, desired,
      false /* not weak */, __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST)

That has similar semantics to x86 locked instructions, except on it's not guaranteed to be a full barrier for other operation (like a fence), only to have seq_cst semantics itself. There can be a real difference on AArch64 for example. But it's still more than fine for a spinlock; that only needs an __ATOMIC_ACQUIRE CAS for taking the lock, and a RELEASE store. Also, since you already call it in a retry loop, you'd use weak = true to get behaviour like atomic_compare_exchange_weak.

Upvotes: 1

Related Questions