shared_lock implemented with std::atomic

Question

I can't afford to hold shared_lock due to its weight. Instead I implemented what I think is shared_lock but with atomic value. Will this code work, or have I missed something?

update: I added RAII classes around atomic mutex.

namespace atm {
static const unsigned int g_unlocked = 0;
static const unsigned int g_exclusive = std::numeric_limits::max();
using mutex_type = std::atomic;

class exclusive_lock {
  public:
    exclusive_lock(mutex_type& mutex) : _mutex(mutex) {
      unsigned int expected = g_unlocked;
      while (!_mutex.compare_exchange_weak(expected, g_exclusive)) {
        _mm_pause();
        expected = g_unlocked;
      }
    }
    ~exclusive_lock() { _mutex.store(g_unlocked, std::memory_order_release); }
  private:
    mutex_type& _mutex;
};

class shared_lock {
  public:
    shared_lock(mutex_type& mutex) : _mutex(mutex) {
      unsigned int expected = _mutex;
      while (expected == g_exclusive || !_mutex.compare_exchange_weak(expected, expected + 1)) {
        _mm_pause();
        expected = _mutex;
      }
    }
    ~shared_lock() {  _mutex.fetch_sub(1, std::memory_order_release); }
  private:
    mutex_type& _mutex;
};
} // namespace atm

Peter Cordes · Accepted Answer

For correctness I think this looks reasonable, I don't see a problem. I might have missed something, but a seq_cst CAS is more than sufficient for acquiring a lock. It looks like you're avoiding integer wrap-around by using the max value as something special.

The mutex=0 unlock only needs to be release, not seq_cst. (Same for the -=1 shared unlock, but that won't make it more efficient on x86, only on weakly-ordered ISAs). Also, compare_exchange_weak would be totally fine; you're retrying in a loop anyway so spurious failure is not different from a failed compare.

If you're on x86, you'd normally want _mm_pause() inside your spin loop, and possibly some kind of backoff to reduce contention if multiple threads are all trying to acquire the lock at once.

And usually you want to spin read-only until you see the lock available, not keep hammering with atomic RMWs. (See Does cmpxchg write destination cache line on failure? If not, is it better than xchg for spinlock?).

Also, short is a strange choice; if any size is going to perform worse than int, it's often short. But probably it's fine, and ok I guess if that helps it pack into the same cache line as the data you're modifying. (Although that cache line will be the victim of false sharing contention from other threads hammering on it trying to take the lock.)

shared_lock implemented with std::atomic

Answers (1)

Related Questions