What are the sequencing guarantees around relaxed atomic memory accesses within a single thread?

I have been exploring world of C++ for a long time and I was interested in following question. I'm interested in formal answer (with links to C++ standard confirming your answer). I hope you will be interested in this not simple question :)

So A is some global std::atomic<int> variable. Other threads only read it.

void foo() {
  // here A != 42
  A.store(42, std::memory_order::relaxed);
  auto a = A.load(std::memory_order::relaxed);
  if (a != 42) assert("What!?")
}

Some thread calls foo(). What language rules guarantee that A.store(42, ...) happen before (more correctly, sequenced before) A.load(...) within the thread that called `foo'?

And now lets add certain B to problem condition. It's some global std::atomic<int> variable too.

And now the foo() function looks like this:

void foo() {
  // here A != 42
  A.store(42, std::memory_order::relaxed);
  B.store(0, std::memory_order::seq_cst);
  auto a = A.load(std::memory_order::relaxed);
  if (a != 42) assert("What!?")
}

On x64, I can be sure that such code is equivalent to (except that we don't change B :D):

void foo() {
  // here A != 42
  A.store(42, std::memory_order::relaxed);
  std::atomic_thread_fence(std::memory_order::seq_cst);
  auto a = A.load(std::memory_order::relaxed);
  if (a != 42) assert("What!?")
}

But is there such guarantee from point of view of C++ language standard?

Upvotes: 1

Answers (1)

Nate Eldredge

Reputation: 58673

What language rules guarantee that A.store(42, ...) happen before (more correctly, sequenced before) A.load(...) within the thread that called `foo'?

This one actually is simple :) See [intro.execution] p9: "Every value computation and side effect associated with a full-expression is sequenced before every value computation and side effect associated with the next full-expression to be evaluated."

A.store(42, std::memory_order::relaxed) is a full-expression under [intro.execution] p5.6 ("an expression that is not a subexpression of another expression and that is not otherwise part of a full-expression.").
auto a = A.load(std::memory_order::relaxed); is an init-declarator and thus a full-expression under p5.4.

Sequenced-before is program order, plain and simple. This is not the subtle part of memory ordering. Your A.store absolutely happens-before your A.load, and your assert can never ever fail.

The C++ memory model doesn't change the semantics of single-threaded programs - they are still the "natural" semantics - and it doesn't change the semantics of accesses within a single thread. Otherwise it'd be nearly impossible to program at all.

As another way to look at it: if you had written

int b;
b = 42;
int a = b;
assert(a == 42);

you would not even be asking the question, right? The sematics of atomic variables are strictly stronger than of non-atomic variables, even with relaxed ordering. So anything that works (i.e. is well-defined) with non-atomic variables will still work if they are upgraded to atomic, no matter what memory_order you use.

The place where confusion sometimes arises is in realizing what happens-before really means. Some people, when they realize this, thinks that it makes memory ordering trivial, because they think that "X happens before Y" tells you that "X will always be observed before Y". It doesn't mean that. What it tells you is just that "Y will observe X".

As to your second question, no, in general an unrelated seq_cst access does not imply a seq_cst fence.

Not even on x86, for that matter. On x86, StoreStore, LoadLoad and LoadStore reordering are already impossible, so seq_cst just has to prevent StoreLoad reordering. This can be accomplished simply by ensuring that between every seq_cst store and every seq_cst load, there is at least one barrier instruction (e.g. mfence, though on x86 an unrelated locked RMW instruction actas as a barrier too). A compiler can do this in two ways:

A seq_cst load emits just an ordinary load; a seq_cst store emits an ordinary store followed by a barrier.
A seq_cst load emits a barrier followed by an ordinary load; a seq_cst store emits just an ordinary store.

So with a compiler following strategy #2, your store to B would just be an ordinary store instruction, with no barrier in sight. It has its usual release semantics, but could still be reordered with later instructions, such as your A.load().

Now, in your program that doesn't have much significance. And for that matter, neither would the fence. Putting a fence or other barrier between accesses to the same variable doesn't really achieve anything. If A were the only variable in your program shared between threads, then adding or removing the fence (or the B.store()) would not change the program's possible behaviors in any way. Fences are only useful when you place them between accesses to different variables. If there were other accesses in your program not shown, then putting a fence on that line could make a difference, but we would have to see the rest of the program to be able to say more.

For example, suppose you had

A.store(42, std::memory_order_relaxed);
B.store(0, std::memory_order_seq_cst);
C.store(17, std::memory_order_relaxed);

Since seq_cst implies release, the A and B stores cannot be reordered with each other. However, C can be reordered before B, and then before A. So another thread doing c = C.load(acquire); a = A.load(acquire); can get c == 17 && a == 0. This is even possible on x86, because since the C++ memory model allows the behavior, the compiler is allowed to do the reordering and emit mov [c], 17 ; mov [a], 42; mov [b], 0. But if in place of the B.store(), you put a release or seq_cst fence, then c == 17 && a == 0 is no longer possible on any conforming implementation.

Upvotes: 2

What are the sequencing guarantees around relaxed atomic memory accesses within a single thread?

Answers (1)

Related Questions