Reputation: 4833

Atomic swap in C

I think I'm missing something obvious here. I have code like this:

int *a = <some_address>;
int *b = <another_address>;

// ...

// desired atomic swap of addresses a and b here
int *tmp = a;
a = b; b = tmp;

I want to do this atomically. I've been looking at __c11_atomic_exchange and on OSX the OSAtomic methods, but nothing seems to perform a straight swap atomically. They can all write a value into 1 of the 2 variables atomically, but never both.

Any ideas?

Upvotes: 7

Answers (2)

Gabriel Staples

Reputation: 53245

I think this is worth presenting up front. This spin lock approach I'm about to present is valid and correct, but it is also scheduler-unaware, meaning that it won't cooperate with the scheduler like a standard mutex, but rather it will fight to function during the time slices it is allowed to run in (randomly from its perspective) by the scheduler. So, it has some tradeoffs.

Here are some comments I've written below this answer regarding this:

I read the whole article by Linus that you linked to. He's criticizing a very specific benchmark spin lock code which is attempting to characterize the Linux scheduler by using spin locks on a timer. Linus is right though: the OS scheduler benchmark routine being described is broken because there is an unsynchronized race condition between the scheduler and that spin lock. That is unrelated to what I'm doing in my spin lock above. I understand the scheduler limitations and potential conflicts...

...with my spin lock implementation. Furthermore, I'm not writing the low-level locking primitives from scratch, which is the part people will almost certainly get wrong. Rather, my C++ compiler (g++) is. That critical section of code is written in the <stdatomic.h> header as atomic_test_and_set and the type atomic_bool which I use too. So, as long as the spin lock limitations are understood, what I have done above is a perfectly legitimate way to approach a problem like this. It should be benchmarked and compared to other approaches to know if it has merit over a standard C++ mutex...

...C mutex, or pthread mutex [in a given contect]. For some of the modern C++ mutexes, see my article on my personal website here: https://gabrielstaples.com/cpp-mutexes-and-locks. Additionally, my answer here isn't meant to present the best approach. It's meant to present a valid approach. The user needs to understand the limitations. It may be used in other contexts too, such as with the Linux soft-real-time SCHED_RR round-robin scheduler, or in a kernel module, or elsewhere. There are certainly tradeoffs, but that doesn't mean alternative approaches like this should not be presented.

Now, if you want to see an interesting spin lock approach, here you go:

Quick summary: spin-lock-based atomic swap answer

This code snippet is what we will be building up to.

For a basic spin lock (high speed, no-context-switch mutex) approach on a multi-core system, which is where spin locks might be most useful (they require multi-core, not just multi-threaded, systems to be functional), you can do an atomic swap of variable values (pointers in this case) like this:

int *a = &some_int1;  // ptr a
int *b = &some_int2;  // ptr b
spinlock_t spinlock = SPINLOCK_UNLOCKED;

// swap pointers `a` and `b` (note: swapping the pointers themselves, not the
// integer values they point to)
atomic_swap_int_ptr(&spinlock, &a, &b);

Important clarification by @Peter Cordes:

...out of any systems where you might consider using spinlocks, they only make any sense on multi-core systems, and its better with pre-emptive multi-tasking.

Here are the details:

General-purpose spin lock approach in C11 (to wrap any "critical section" that needs to be atomic)

If your architecture supports lock-free atomic swaps using the array "pair" presented in the main answer here, then consider that. It may be lock-free and hence much more efficient than using a mutex or spin lock.

I wanted to delve into this a bit deeper though and try my hand at a spin lock. I have not speed profiled the various techniques (array "pair", mutex, and spin lock), but it would be very interesting and academic for me to do so to see how they compare in speed.

Anyway, as I was saying: you could also use a mutex.

Or, if you don't want your thread to go to sleep if the mutex isn't immediately available, you can improve efficiency by using a spin lock, since the number of times the spin lock has to spin is frequently so small that it is faster to spin than to put a thread to sleep and have a whole context switch.

C11 has a whole suite of atomic types and functions which allow you to easily implement your own spin lock to accomplish this.

Keep in mind, however, that a basic spin lock must not be used on a bare metal (no operating system) single core (processor) system, as it will result in deadlock if the main context takes the lock and then while the lock is still taken, an ISR fires and tries to take the lock. The ISR will block indefinitely.

To avoid this, more-complicated locking mechanisms must be used, such as a spin lock with a timeout, an ISR which auto-reschedules itself for a slightly later time if the lock is taken, automatic yielding, etc.

So, spin locks are best reserved for multi-core systems with an OS and scheduler.

Basic spin lock implementation in C11 using `_Atomic` types

#include <stdbool.h>
#include <stdatomic.h>


#define SPINLOCK_UNLOCKED 0
#define SPINLOCK_LOCKED   1

typedef volatile atomic_bool spinlock_t;

// This is how you'd create a new spinlock to use in the functions below.
// spinlock_t spinlock = SPINLOCK_UNLOCKED;

/// \brief  Set the spin lock to `SPINLOCK_LOCKED`, and return the previous 
///  state of the `spinlock` object.
/// \details    If the previous state was `SPINLOCK_LOCKED`, you can know that
///  the lock was already taken, and you do NOT now own the lock. If the
///  previous state was `SPINLOCK_UNLOCKED`, however, then you have now locked
///  the lock and own the lock.
bool atomic_test_and_set(spinlock_t* spinlock)
{
    bool spinlock_val_old = atomic_exchange(spinlock, SPINLOCK_LOCKED);
    return spinlock_val_old;
}

/// Spin (loop) until you take the spin lock.
void spinlock_lock(spinlock_t* spinlock)
{
    // Spin lock: loop forever until we get the lock; we know the lock was
    // successfully obtained after exiting this while loop because the
    // atomic_test_and_set() function locks the lock and returns the previous
    // lock value. If the previous lock value was SPINLOCK_LOCKED then the lock
    // was **already** locked by another thread or process. Once the previous
    // lock value was SPINLOCK_UNLOCKED, however, then it indicates the lock
    // was **not** locked before we locked it, but now it **is** locked because
    // we locked it, indicating we own the lock.
    while (atomic_test_and_set(spinlock) == SPINLOCK_LOCKED) {};
}

/// Release the spin lock.
void spinlock_unlock(spinlock_t* spinlock)
{
    *spinlock = SPINLOCK_UNLOCKED;
}

Now, you can use the spin lock, for your example, in your multi-core system, like this:

/// Atomically swap two pointers to int (`int *`). 
/// NB: this does NOT swap their contents, meaning: the integers they point to.
/// Rather, it swaps the pointers (addresses) themselves.
void atomic_swap_int_ptr(spinlock_t * spinlock, int ** ptr1, int ** ptr2)
{
    spinlock_lock(spinlock)
    // critical, atomic section start

    int * temp = *ptr1;
    *ptr1 = *ptr2;
    *ptr2 = temp;
    
    // critical, atomic section end
    spinlock_unlock(spinlock)
}


// Demo to perform an address swap

int *a = &some_int1;
int *b = &some_int2;
spinlock_t spinlock = SPINLOCK_UNLOCKED;

// swap pointers `a` and `b` (note: swapping the pointers themselves, not the
// integer values they point to)
atomic_swap_int_ptr(&spinlock, &a, &b);  // <=========== final answer

References

https://en.wikipedia.org/wiki/Test-and-set#Pseudo-C_implementation_of_a_spin_lock
***** C11 Concurrency support library and atomic_bool and other types: https://en.cppreference.com/w/c/thread
C11 atomic_exchange() function: https://en.cppreference.com/w/c/atomic/atomic_exchange

TODO (newest on top)

[ ] 20230418: Upgrade this implementation and speed profile it with the following changes. See C concurrency documentation here: https://en.cppreference.com/w/c/thread:
1. [ ] Use atomic_flag instead of atomic_bool: https://en.cppreference.com/w/c/atomic/atomic_flag (emphasis added):
  
  atomic_flag is an atomic boolean type. Unlike other atomic types, it is guaranteed to be lock-free. Unlike atomic_bool, atomic_flag does not provide load or store operations.
  1. Be sure to initialize it with ATOMIC_FLAG_INIT.
  2. Also, use atomic_flag_test_and_set() instead of my custom atomic_test_and_set() above, and use atomic_flag_clear() to clear or "unset" the flag.
[ ] 20220922: This implementation would need to be modified to add safety anti-deadlock mechanisms such as automatic deferral, timeout, and retry, to prevent deadlock on single core and bare-metal systems. See my notes here: Add basic mutex (lock) and spin lock implementations in C11, C++11, AVR, and STM32 and here: What are the various ways to disable and re-enable interrupts in STM32 microcontrollers in order to implement atomic access guards?

Upvotes: 0

Jens Gustedt

Reputation: 79003

It depends a bit on your architecture, what is effectively and efficiently possible. In any case you would need that the two pointers that you want to swap be adjacent in memory, the easiest that you have them as some like

typedef struct pair { void *a[2]; } pair;

Then you can, if you have a full C11 compiler use _Atomic(pair) as an atomic type to swap the contents of such a pair: first load the actual pair in a temporary, construct a new one with the two values swapped, do a compare-exchange to store the new value and iterate if necessary:

inline
void pair_swap(_Atomic(pair) *myPair) {
  pair actual = { 0 };
  pair future = { 0 };

  while (!atomic_compare_exchange_weak(myPair, &actual, future)) {
      future.a[0] = actual.a[1];
      future.a[1] = actual.a[0];
  }
}

Depending on your architecture this might be realized with a lock free primitive operation or not. Modern 64 bit architectures usually have 128bit atomics, there this could result in just some 128 bit assembler instructions.

But this would depend a lot on the quality of the implementation of atomics on your platform. P99 would provide you with an implementation that implements the "functionlike" part of atomic operations, and that supports 128 bit swap, when detected; on my machine (64 bit Intel linux) this effectively compiles into a lock cmpxchg16b instruction.

Relevant documentation:

Upvotes: 9

Atomic swap in C

Answers (2)

Quick summary: spin-lock-based atomic swap answer

General-purpose spin lock approach in C11 (to wrap any "critical section" that needs to be atomic)

Basic spin lock implementation in C11 using `_Atomic` types

References

Other links

TODO (newest on top)

Related Questions

Atomic swap in C

Answers (2)

Quick summary: spin-lock-based atomic swap answer

General-purpose spin lock approach in C11 (to wrap any "critical section" that needs to be atomic)

Basic spin lock implementation in C11 using _Atomic types

References

Other links

TODO (newest on top)

Related Questions

Basic spin lock implementation in C11 using `_Atomic` types