How are memory_order_seq_cst fences useful anymore in C++20?

Question

Consider this code:

std::atomic x{ 0 };
std::atomic y{ 0 };
int a;
int b;

void thread1()
{
    //atomic op A
    x.store(1, std::memory_order_relaxed);

    //fence X
    std::atomic_thread_fence(std::memory_order_seq_cst);
    //sequenced-before P, thus in SC order X=>P

    //atomic op P
    a = y.load(std::memory_order_seq_cst);//0
    //reads-before(from-read) Q, thus in SC order P=>Q
}

void thread2()
{
    //atomic op Q
    y.store(1, std::memory_order_seq_cst);
    //sequenced-before B, thus in SC order Q=>B

    //atomic op B
    b = x.load(std::memory_order_seq_cst);
}

int main()
{
    std::thread t2(thread2);
    std::thread t1(thread1);
    t1.join();
    t2.join();
    assert(a == 1 || b == 1);//true?
    return 0;
}

The question is: is the assertion a == 1 || b == 1 always true in C++20?

I think it's true in C++17. The idea is this: first assume a get 0, then prove b must get 1. Let's split it into two parts:

1. Fence X precedes atomic op B in the SC total order

X is sequenced-before P, thus X=>P
P reads 0 and Q writes 1, thus P reads-before(from-read) Q, thus P=>Q
- I don't find the clause in the standard that guarantees this, but this paper said so. Please point out if I get it wrong. Edit: Now I know. It's the same rule about coherence-ordered before, as in Nate's answer.
Q is sequenced-before B, thus Q=>B

2. Atomic op B reads the value(1) written by atomic op A

C++17 has this in the standard:

For atomic operations A and B on an atomic object M, where A modifies M and B takes its value, if there is a memory_order::seq_cst fence X such that A is sequenced before X and B follows X in S, then B observes either the effects of A or a later modification of M in its modification order.

It's exactly the case here. Op A and B are on atomic object x, where A stores 1 in x and B reads x's value; and there is the seq_cst fence X; A is sequenced before X and B follows X in S(the above); and there's no modification of M later than A. So B observes the value written by A. So b gets 1.

Is the above reasoning correct?

But in C++20, the above sited text about fences is removed (by P0668), and upgraded to "strengthened" seq_cst fences, which reads:

An atomic operation A on some atomic object M is coherence-ordered before another atomic operation B on M if

A is a modification, and B reads the value stored by A, or

A precedes B in the modification order of M, or

A and B are not the same atomic read-modify-write operation, and there exists an atomic modification X of M such that A reads the value stored by X and X precedes B in the modification order of M, or

there exists an atomic modification X of M such that A is coherence-ordered before X and X is coherence-ordered before B.

There is a single total order S on all memory_order::seq_cst operations, including fences, that satisfies the following constraints. First, if A and B are memory_order::seq_cst operations and A strongly happens before B, then A precedes B in S. Second, for every pair of atomic operations A and B on an object M, where A is coherence-ordered before B, the following four conditions are required to be satisfied by S:

if A and B are both memory_order::seq_cst operations, then A precedes B in S; and

if A is a memory_order::seq_cst operation and B happens before a memory_order::seq_cst fence Y , then A precedes Y in S; and

if a memory_order::seq_cst fence X happens before A and B is a memory_order::seq_cst operation, then X precedes B in S; and

if a memory_order::seq_cst fence X happens before A and B happens before a memory_order::seq_- cst fence Y , then X precedes Y in S.

I don't see how a==1 || b==1 is guaranteed any more. It's only about how SC fences are ordered with regard to other SC operations, or operations that are coherence-ordered with each other.

I can't find anything else about how SC fences interact with non-SC atomic operations in the C++20 standard, and a load that does not see a value from a store doesn't mean it's coherence-ordered before the store. (Or does it? If so that would fix the problem; see comments) 31.4 : 3.2 applies when A precedes B in the modification order of M, but reads aren't part of the modification order of an object, are they?

Is it I didn't look into the standard hard enough, or the code simply doesn't work any more in C++20? If it's the later, is the regression intended? (In P0668, they claim to "strengthen" fences).

Nate Eldredge · Accepted Answer

Yes, I think we can prove that a == 1 || b == 1 is always true. Most of the ideas here were worked out in comments by zwhconst and Peter Cordes, so I just thought I would write it up as an exercise.

(Note that X, Y, A, B below are used as the dummy variables in the standard's axioms, and may change from line to line. They do not coincide with the labels in your code.)

Suppose b = x.load() in thread2 yields 0.

We do have the coherence ordering that you asked about. Specifically, if b = x.load yields 0, then I claim that x.load() in thread2 is coherence ordered before x.store(1) in thread1, thanks to the third bullet in the definition of coherence ordering. For let A be x.load(), B be x.store(1), and X be the initialization x{0} (see below for quibble). Clearly X precedes B in the modification order of x, since X happens-before B (synchronization occurs when the thread is started), and if b == 0 then A has read the value stored by X.

(There is possibly a gap here: initialization of an atomic object is not an atomic operation (3.18.1p3), so as worded, the coherence ordering does not apply to it. I have to believe it was intended to apply here, though. Anyway, we could dodge the issue by putting x.store(0, std::memory_order_relaxed); in main before starting the threads, which would still address the spirit of your question.)

Now in the definition of the ordering S, apply the second bullet with A = x.load() and B = x.store(1) as before, and Y being the atomic_thread_fence in thread1. Then A is coherence-ordered before B, as we just showed; A is seq_cst; and B happens-before Y by sequencing. So therefore A = x.load() precedes Y = fence in the order S.

Now suppose a = y.load() in thread1 also yields 0.

By a similar argument to before, y.load() is coherence ordered before y.store(1), and they are both seq_cst, so y.load() precedes y.store(1) in S. Also, y.store(1) precedes x.load() in S by sequencing, and likewise atomic_thread_fence precedes y.load() in S. We therefore have in S:

x.load precedes fence precedes y.load precedes y.store precedes x.load

which is a cycle, contradicting the strict ordering of S.

How are memory_order_seq_cst fences useful anymore in C++20?

1. Fence X precedes atomic op B in the SC total order

2. Atomic op B reads the value(1) written by atomic op A

Answers (1)

Related Questions