Could the following code written for a Reentrant Lock be susceptible to a stale data error and provide locks to multiple threads?

Question

I recently came across the following code while learning about Reentrant Locks in Lock-Free Concurrency:

class ReentrantLock32
 {
  std::atomic m_atomic;
  std::int32_t m_refCount;

public:
  ReentrantLock32() : m_atomic(0), m_refCount(0) {}

  void Acquire()
   {
    std::hash hasher;
    std::size_t tid = hasher(std::this_thread::get_id());

    if (m_atomic.load(std::memory_order_relaxed) != tid)
     {
       std::size_t unlockValue = 0;
       while (!m_atomic.compare_exchange_weak(
        unlockValue,
        tid,
        std::memory_order_relaxed,
        std::memory_order_relaxed))
       {
        unlockValue = 0;
        PAUSE();
       }
      }
      ++m_refCount;
      std::atomic_thread_fence(std::memory_order_acquire);
     }

  void Release() {
   std::atomic_thread_fence(std:memory_order_release);
   std::hash hasher;
   std::size_t tid = hasher(std::this_thread::get_id());
   std::size_t actual = m_atomic.load(std::memory_order_relaxed);
   assert(actual == tid);

   --m_refCount;
   if (m_refCount == 0)
    {
     m_atomic.store(0,std::memory_order_relaxed);
    }
 }
//...
}

However, it appears that there is a chance of stale data leading to multiple threads acquiring the lock, especially when thread contention is high.

!m_atomic.compare_exchange_weak(
        unlockValue,
        tid,
        std::memory_order_relaxed,
        std::memory_order_relaxed)

If two competing threads from different cores attempt to call compare_exchange_weak at the same time, isn't there a chance that the cache coherency protocol for the CPU could fail to invalidate the L1-cache before both threads acquire the lock?

Humphrey Winnebago · Accepted Answer

If two competing threads from different cores attempt to call compare_exchange_weak at the same time, isn't there a chance that the cache coherency protocol for the CPU could fail to invalidate the L1-cache before both threads acquire the lock?

In short, no.

Compare-exchange (CAS) is a read-modify-write (RMW) operation. (Technically in C++ it's an RMW if it succeeds, but just a load if it fails). RMW operations virtually guarantee that staleness will not occur. You mentioned the coherency protocol...RMW operations require exclusive access to the memory location. This will invalidate it in other cores' caches. If the core failed to invalidate other cores when getting exclusive mode, that would be a bug, and I think we'd all be in big trouble.

x86 gives lock cmpxchg, which clearly exclusive access (it's what lock does).

Arm essentially does what is known as a load-locked/store-conditional (LL/SC). See this example Arm7 implementation and documentation.

Cmpxchg Relaxed (32 bit):   _loop: ldrex roldval, [rptr]; mov rres, 0; teq roldval, rold; strexeq rres, rnewval, [rptr]; teq rres, 0; bne _loop

Basically it loads the memory location, but it "keeps an eye on it." It checks the value and, if the condition holds, attempts to store. However, if another core had written to it between the load and the store, the store fails. This example has a loop; it is equivalent to compare_exchange_strong. Without the loop it is like compare_exchange_weak.

However, it appears that there is a chance of stale data leading to multiple threads acquiring the lock, especially when thread contention is high.

C++ guarantees cache coherency with respect to a single object for atomics. This means that stale reads are not allowed. (Stale means the old value in cache never gets updated.) A store to an atomic must eventually be visible to all other cores. However, a core can still read an old value, if it reads at the perfectly wrong time. An old read, like a stale read, could similarly lead to a race condition if not for the fact that compare_exchange_weak is an atomic operation.

High contention is an undesirable scenario, since it can cause various overheads to cascade into a major traffic jam.

Could the following code written for a Reentrant Lock be susceptible to a stale data error and provide locks to multiple threads?

Answers (1)

Related Questions