Reputation: 721
I recently came across the following code while learning about Reentrant Locks in Lock-Free Concurrency:
class ReentrantLock32
{
std::atomic<std::size_t> m_atomic;
std::int32_t m_refCount;
public:
ReentrantLock32() : m_atomic(0), m_refCount(0) {}
void Acquire()
{
std::hash<std::thread::id> hasher;
std::size_t tid = hasher(std::this_thread::get_id());
if (m_atomic.load(std::memory_order_relaxed) != tid)
{
std::size_t unlockValue = 0;
while (!m_atomic.compare_exchange_weak(
unlockValue,
tid,
std::memory_order_relaxed,
std::memory_order_relaxed))
{
unlockValue = 0;
PAUSE();
}
}
++m_refCount;
std::atomic_thread_fence(std::memory_order_acquire);
}
void Release() {
std::atomic_thread_fence(std:memory_order_release);
std::hash<std::thread::id> hasher;
std::size_t tid = hasher(std::this_thread::get_id());
std::size_t actual = m_atomic.load(std::memory_order_relaxed);
assert(actual == tid);
--m_refCount;
if (m_refCount == 0)
{
m_atomic.store(0,std::memory_order_relaxed);
}
}
//...
}
However, it appears that there is a chance of stale data leading to multiple threads acquiring the lock, especially when thread contention is high.
!m_atomic.compare_exchange_weak(
unlockValue,
tid,
std::memory_order_relaxed,
std::memory_order_relaxed)
If two competing threads from different cores attempt to call compare_exchange_weak at the same time, isn't there a chance that the cache coherency protocol for the CPU could fail to invalidate the L1-cache before both threads acquire the lock?
Upvotes: 0
Views: 61
Reputation: 1692
If two competing threads from different cores attempt to call compare_exchange_weak at the same time, isn't there a chance that the cache coherency protocol for the CPU could fail to invalidate the L1-cache before both threads acquire the lock?
In short, no.
Compare-exchange (CAS) is a read-modify-write (RMW) operation. (Technically in C++ it's an RMW if it succeeds, but just a load if it fails). RMW operations virtually guarantee that staleness will not occur. You mentioned the coherency protocol...RMW operations require exclusive access to the memory location. This will invalidate it in other cores' caches. If the core failed to invalidate other cores when getting exclusive mode, that would be a bug, and I think we'd all be in big trouble.
x86 gives lock cmpxchg
, which clearly exclusive access (it's what lock
does).
Arm essentially does what is known as a load-locked/store-conditional (LL/SC). See this example Arm7 implementation and documentation.
Cmpxchg Relaxed (32 bit): _loop: ldrex roldval, [rptr]; mov rres, 0; teq roldval, rold; strexeq rres, rnewval, [rptr]; teq rres, 0; bne _loop
Basically it loads the memory location, but it "keeps an eye on it." It checks the value and, if the condition holds, attempts to store. However, if another core had written to it between the load and the store, the store fails. This example has a loop; it is equivalent to compare_exchange_strong
. Without the loop it is like compare_exchange_weak
.
However, it appears that there is a chance of stale data leading to multiple threads acquiring the lock, especially when thread contention is high.
C++ guarantees cache coherency with respect to a single object for atomics. This means that stale reads are not allowed. (Stale means the old value in cache never gets updated.) A store to an atomic must eventually be visible to all other cores. However, a core can still read an old value, if it reads at the perfectly wrong time. An old read, like a stale read, could similarly lead to a race condition if not for the fact that compare_exchange_weak
is an atomic operation.
High contention is an undesirable scenario, since it can cause various overheads to cascade into a major traffic jam.
Upvotes: 1