xworder
xworder

Reputation: 91

Synchronising with mutex and relaxed memory order atomic

I have a shared data structure that is already internally synchronised with a mutex. Can I use an atomic with memory order relaxed to signal changes. A very simplified view of what I mean in code

Thread 1

shared_conf.set("somekey","someval");
is_reconfigured.store(true, std::memory_order_relaxed);

Thread 2

if (is_reconfigured.load(std::memory_order_relaxed)) {
  inspect_shared_conf();
}

Is it guaranteed that I'll see updates in the shared_map?; the shared map itself internally synchronises every write/read call with a mutex

Upvotes: 3

Views: 991

Answers (2)

Nate Eldredge
Nate Eldredge

Reputation: 57922

Your example code will work, and yes, you will see the updates. The relaxed ordering will give you the correct behavior. That said, it may not actually be optimal in terms of performance.

Let's look at a more concrete example, with the mutexes made explicit.

std::mutex m;
std::atomic<bool> updated;
foo shared;

void thr1() {
    m.lock();
    shared.write(new_data);
    m.unlock();
    updated.store(true, std::memory_order_relaxed);
}

void thr2() {
    if (updated.load(std::memory_order_relaxed)) {
        m.lock();
        data = shared.read();
        m.unlock();
    }
}

Informal explanation

m.lock() is an acquire operation and m.unlock() is release. This means that nothing prevents updated.store(true) from "floating" up into the critical section, past m.unlock() and even shared.write(). At first glance this seems bad, because the whole point of the updated flag was to signal that shared.write() had finished. But no actual harm occurs in that case, because thr1 still holds the mutex m, and so if thr2 starts trying to read the shared data, it will just wait until thr1 drops it.

What would really be bad is if updated.store() were to float all the way up past m.lock(); then thr2 could potentially see updated.load() == true and take the mutex before thr1. However, this cannot happen because of the acquire semantics m.lock().

There could be some related issues in thr2 (a little more complicated because they would have to be speculative) but again we are saved by the same fact: the updated.load() can sink downward into the critical section, but not past it entirely (because m.unlock() is release).

But this is an instance where a stronger memory order on the updated operations, although seemingly more expensive, might actually improve performance. If the value true in updated becomes visible prematurely, then thr2 attempts to lock m while it is already locked by thr1, and so thr2 will have to block while it waits for m to become available. But if you changed to updated.store(true, std::memory_order_release) and updated.load(std::memory_order_acquire), then the value true in updated can only become visible after m is truly unlocked by thr1, and so the m.lock() in thr2 should always succeed immediately (ignoring contention by any other threads that might exist).


Proof

Okay, that was an informal explanation, but we know those are always risky when thinking about memory ordering. Let's give a proof from the formal rules of the C++ memory model. I will follow the C++20 standard because I have it handy, but I don't believe there are any significant relevant changes from C++17. See [intro.races] for definitions of the terms used here.

I claim that, if shared.read() executes at all, then shared.write(new_data) happens before it, and so by write-read coherence [intro.races p18] shared.read() will see the new data.

The lock and unlock operations on m are totally ordered [thread.mutex.requirements.mutex p5]. Consider two cases: either thr1's unlock precedes thr2's lock (Case I), or vice versa (Case II).

Case I

If thr1's unlock precedes thr2's lock in m's lock order, then there is no problem; they synchronize with each other [thread.mutex.requirements.mutex p11]. Since shared.write(new_data) is sequenced before thr1's m.unlock(), and thr2's m.lock() is sequenced before shared.read(), by chasing the definitions in [intro.races] we see that shared.write(new_data) indeed happens before shared.read().

Case II

Now suppose the contrary, that thr2's lock precedes thr1's unlock in m's lock order. Since locks and unlocks of the same mutex cannot interleave (that's the whole point of a mutex, to provide mutual exclusion), the lock total order on m must be as follows:

thr2: m.lock()
thr2: m.unlock()
thr1: m.lock()
thr1: m.unlock()

That means that thr2's m.unlock() synchronizes with thr1's m.lock(). Now updated.load() is sequenced before thr2 m.unlock(), and thr1 m.lock() is sequenced before updated.store(true), so it follows that updated.load() happens before updated.store(true). By read-write coherence [intro.races p17], updated.load() must not take its value from updated.store(true), but from some strictly earlier side effect in the modification order of updated; presumably its initialization to false.

We conclude that updated.load() must return false in this case. But if that were so, then thr2 would never have tried to lock the mutex in the first place. This is a contradiction, so Case II must never occur.

Upvotes: 3

Nicol Bolas
Nicol Bolas

Reputation: 473192

Relaxed order means that ordering of atomics and external operations only happens with regard to operations on the specific atomic object (and even then, the compiler is free to re-order them outside of the program-defined order). Thus, a relaxed store has no relationship to any state in external objects. So a relaxed load will not synchronize with your other mutexes.

The whole point of acquire/release semantics is to allow an atomic to control visibility of other memory. If you want an atomic load to mean that something is available, it must be an acquire and the value it acquired must have been released.

Upvotes: 2

Related Questions