How effective a barrier is a atomic write followed by an atomic read of the same variable?

Consider the following:

#include <atomic>

std::atomic<unsigned> var;
unsigned foo;
unsigned bar;

unsigned is_this_a_full_fence() {
     var.store(1, std::memory_order_release);
     var.load(std::memory_order_acquire);
     bar = 5;
     return foo;
}

My thought is the dummy load of var should prevent the subsequent variable accesses of foo and bar from being reordered before the store.

It seems the code creates a barrier against reordering - and at least on x86, release and acquire require no special fencing instructions.

Is this a valid way to code a full fence (LoadStore/StoreStore/StoreLoad/LoadLoad)? What am I missing?

I think the release creates a LoadStore and StoreStore barrier. The acquire creates a LoadStore and LoadLoad barrier. And the dependency between the two variable accesses creates a StoreLoad barrier?

EDIT: change barrier to full fence. Make snippet C++.

Upvotes: 1

Views: 617

Answers (3)

JF Bastien
JF Bastien

Reputation: 6853

One major issue with this code is that the store and subsequent load to the same memory location are clearly not synchronizing with any other thread. In the C++ memory model races are undefined behavior, and the compiler can therefore assume your code didn't have a race. The only way that your load could observe a value different from what was stored is if you had a race. The compiler can therefore, under the C++ memory model, assume that the load observes the stored value.

This exact atomic code sequence appears in my C++ standards committee paper no sane compiler would optimize atomics under "Redundant load eliminated". There's a longer CppCon version of this paper on YouTube.

Now imagine C++ weren't such a pedant, and the load / store were guaranteed to stay there despite the inherent racy nature. Real-world ISAs offer such guarantees which C++ doesn't. You provide some happens-before relationship with other threads with acquire / release, but you don't provide a unique total order which all threads agree on. So yes this would act as a fence, but it wouldn't be the same as obtaining sequential consistency, or even total store order. Some architectures could have threads which observe events in a well-defined but different order. That's perfectly fine for some applications! You'll want to look into IRIW (independent reads of independent writes) to learn more about this topic. The x86-TSO paper discusses it specifically in the context of the ad-hoc x86 memory model, as implemented in various processors.

Upvotes: 2

LWimsey
LWimsey

Reputation: 6647

Assuming that you run this code in multiple threads, using ordering like this is not correct because the atomic operations do not synchronize (see link below) and hence foo and bar are not protected.

But it still may have some value to look at guarantees that apply to individual operations.
As an acquire operation, var.load is not reordered (inter-thread) with the operations on foo and bar (hence #LoadStore and #LoadLoad, you got that right).
However, var.store, is not protected against any reordering (in this context).

#StoreLoad reordering can be prevented by tagging both atomic operations seq_cst. In that case, all threads will observe the order as defined (still incorrect though because the non-atomics are unprotected).

EDIT
var.store is not protected against reordering because it acts as a one-way barrier for operations that are sequenced before it (i.e earlier in program order) and in your code there are no operations before that store.
var.load acts as a one-way barrier for operations that are sequenced after it (i.e. foo and bar).

Here is a basic example of how a variable (foo) is protected by an atomic store/load pair:

// thread 1
foo = 42;
var.store(1, std::memory_order_release);

// thread 2
while (var.load(std::memory_order_acquire) != 1);
assert(foo == 42);

Thread 2 only continues after it observes the value set by thread 1.. The store is then said to have synchronized with the load and the assert cannot fire.

For a complete overview, check Jeff Preshing's blog articles.

Upvotes: 0

Your pseudo-code (which is not valid C++) is not atomic as a whole.

For example, a context switch could happen between the store and the load and some other thread would become scheduled (or is already running on some other core) and would then change the variable in between. Context switches and interrupts can happen at every machine instruction.

Is this a valid way to code a barrier

No, it is not. See also pthread_barrier_init(3p), pthread_barrier_wait(3p) and related functions.

You should read some pthread tutorial (in practice, C++11 threads are a tiny abstraction above them) and consider using mutexes.

Notice that std::memory_order affects mostly the current thread (and what it is observing), and do not forbid it from being interrupted/context-switched ...

See also this answer.

Upvotes: 0

Related Questions