Mati
Mati

Reputation: 781

Does C++ sequential consistency guarantee reading the latest value stored in memory?

I am quite new to C++ atomics and memory order and cannot wrap my head around one point. Consider the example taken directly from there: https://preshing.com/20120612/an-introduction-to-lock-free-programming/#sequential-consistency

std::atomic<int> X(0), Y(0);
int r1, r2;

void thread1()
{
    X.store(1);
    r1 = Y.load();
}

void thread2()
{
    Y.store(1);
    r2 = X.load();
}

There are sixteen possible outcomes in the total memory order:

  1. thread 1 store -> thread 1 load -> thread 2 store -> thread 2 load
  2. thread 1 store -> thread 2 store -> thread 1 load -> thread 2 load
  3. ...

Does sequential consistent program guarantee that if a particular store operation on some atomic variable happens before a load operation performed on the same atomic variable (but in another thread), the load will always see latest value stored (i.e., second point on the list above, where two stores happens before two loads in the total order)? In other words, if one put assert(r1 != 0 && r2 != 0) later in the program, would it be possible to fire the assert? According to the article such situation is not possible to take place. However, there is a quote taken from another thread where Anthony Williams commented on that: Concurrency: Atomic and volatile in C++11 memory model

"The default memory ordering of std::memory_order_seq_cst provides a single global total order for all std::memory_order_seq_cst operations across all variables. This doesn't mean that you can't get stale values, but it does mean that the value you do get determines and is determined by where in this total order your operation lies.

Who is right, who is wrong? Or maybe it's only my misunderstanding and both answers are correct.

Upvotes: 3

Views: 326

Answers (1)

Nate Eldredge
Nate Eldredge

Reputation: 58052

All the statements you quoted are correct. I think the confusion is coming from ambiguity in terms like "latest" and "stale", which could refer either to the sequentially consistent total order, or to ordering in real time. Those two orders do not have to be consistent with each other, and only the former is relevant to describing the program's observable behavior.

Let's start by looking at your program and then come back to the terminology afterwards.

There are sixteen possible outcomes in the total memory order:

No, there are only six. Let's call the operations XS, XL, YS, YL for the stores and loads to X and Y respectively. The total order has to be consistent with the sequencing (program order) of each thread, hence the name "sequential consistency". So XS has to precede YL in the total order, and YS has to precede XL.

Does sequential consistent program guarantee that if a particular store operation on some atomic variable happens before a load operation performed on the same atomic variable (but in another thread), the load will always see latest value stored (i.e., second point on the list above, where two stores happens before two loads in the total order)?

Careful, let's not use the phrase "happens before", as that refers to a different partial order in the memory model, which we do not have to consider when interested only in ordering of seq_cst atomic operations.

Sequential consistency does guarantee reading the "latest" value stored, where by "latest" we mean with respect to the total order. To be precise, each load L of a variable X takes its value from the unique store S to X which satisfies the following conditions: S precedes L, and every other store to X precedes S. So in your program, XL will return 1 if it follows XS in the total order, otherwise it will return 0.

Thus here are the possible total orders, and the corresponding values returned by XL and YL (your r2 and r1 respectively):

  1. XS, YL, YS, XL: here XL == 1 and YL == 0.

  2. XS, YS, YL, XL: here XL == 1 and YL == 1.

  3. XS, YS, XL, YL: here XL == 1 and YL == 1.

  4. YS, XS, YL, XL: here XL == 1 and YL == 1.

  5. YS, XS, XL, YL: here XL == 1 and YL == 1.

  6. YS, XL, XS, YL: here XL == 0 and YL == 1.

Note there are no orderings resulting in XL == 0 and YL == 0. That would require XL to precede XS, and YL to precede YS. But program order already requires that XS precedes YL and YS precedes XL. That would make a cycle, which by definition of a total order is not allowed.

In other words, if one put assert(r1 != 0 && r2 != 0) later in the program, would it be possible to fire the assert? According to the article such situation is not possible to take place.

I think you misread Preshing's article, or maybe you just have a typo in your question. Preshing is saying that r1 and r2 cannot both be zero, i.e., that assert(r1 != 0 || r2 != 0) would not fire. That is absolutely correct. But your assertion with && certainly could fire, in the case of orders 1 or 6 above.

"This doesn't mean that you can't get stale values, but it does mean that the value you do get determines and is determined by where in this total order your operation lies." [Anthony Williams]

Here Anthony means "stale" in the sense of real time. For instance, it is quite possible that XS executes at time 12:00:00.0000001 and XL executes at time 12:00:00.0000002, but XL still loads the value 0. There can be real-time "lag" before an operation becomes globally visible.

But if this happens, it means we are in a total ordering in which XL precedes XS. That makes the total ordering inconsistent with wall clock time, but that is allowed. What cannot happen is for such "lag" to reverse the ordering of visibility for two operations from the same thread. In this example, the machine might have to delay the execution of YL until time 12:00:00.0000003 so that it does not become visible before XS. The compiler would be responsible for inserting appropriate barrier instructions to ensure that this will happen.

(This sets aside the fact that on a modern CPU, it doesn't even make sense to talk about the "time" at which an instruction executes. An instruction can execute in several stages spanning many clock cycles, and even within a single core, this may be happening for several instructions at once. The machine is required to preserve the illusion of program order for the core observing its own operations, but not necessarily when they are observed by other cores.)

Because of the total order, it is actually valid to treat all seq_cst operations as happening at distinct ticks of some global "clock", where visibility and causality are preserved with respect to this clock. It's just that this clock may not always be running forwards in time with respect to the clock on your wall.

Upvotes: 5

Related Questions