Reputation: 31
// Thread 1:
x.store(1, std::memory_order_seq_cst); // A
y.store(1, std::memory_order_release); // B
// Thread 2:
r1 = y.fetch_add(1, std::memory_order_seq_cst); // C
r2 = y.load(std::memory_order_relaxed); // D
// Thread 3:
y.store(3, std::memory_order_seq_cst); // E
r3 = x.load(std::memory_order_seq_cst); // F
is allowed to produce r1 == 1 && r2 == 3 && r3 == 0, where A happens-before C, but C precedes A in the single total order C-E-F-A of memory_order_seq_cst (see Lahav et al).
Because I am a novice, I have been confused when looking at the memory model recently. I would like to ask how to treat this sentence "A happens-before C"? Shouldn't we generally think that the happens-before between different threads should have a while loop to do synchronization? Does the context here mean that if C happens to be loaded to the value stored in A (but it may not be loaded, because it is not a while loop)? In addition, how to understand A happens-before. C, but C precedes A in single total order. From the perspective of Thread B, A happens-before C, then the result of A execution C is visible, why does C precedes A appear?
Upvotes: 3
Views: 374
Reputation: 567
The following contents use many levels of markdown item lists which may be difficult to view on the mobile devices.
Recently I also read the same cppreference section and has the same question as you at first and got the idea after viewing related SO QAs and papers. I hope this answer can also help you understand what the cppreference says.
Your question seems to be a duplicate of QA_1 and QA_2, but both QAs seem to not read the original paper_1 which is referenced in the cppreference_1 and paper_2. So here I will give one answer mainly based on the paper_1 and partly based on its reference paper_3 from perspective of mathematics which may help grasp the inner ideas if having mathematics basic knowledge.
These are based on my understanding of the papers and related links. Please point out errors if any, thanks in advance.
S(k, m)
y
but not about x
, so A happens before C.
(This is definition of "happens before": a sequence composed of sb and sw.)S(m, o, p, k)
(here I coalesced different S(,)
by order).
S(p,k)
is due to "reads-before" (IMO, this means "reads-before-write" due to S(k, m)
and S(m, o, p, k)
where the former stays "happens-before" and the latter is the new total modification order.(A)-B-C-E-D-F-A
(here "(A)" means it runs before but observed later.)
Notice: maybe the above optimized memory model introduced in c++20 is still flawed. However, I'm not one compiler/computer architecture expert, so it's beyond my abilities to find the flaws.
The following needs some knowledge of discrete mathematics and I add some description for someone temporarily not familiar with them.
Notice: here only some symbols are rephrased, you may better view the original papers if has some questions about some terminology.
math
Part of the following primitive symbol definitions (mainly about relation) can be seen from "Notation 1" in the paper_1 like and "Definition 8" in the like (;)
paper_3 which is referenced in the paper_1.
Here I assume that they take same math primitive symbols in their the papers because paper_1 "Remark 1" says:
The reason we use Batty et al.’s version here is that it provides a cleaner starting point for our discussion, and our solution to the problems with C11’s SC semantics will build on it.
And after viewing the paper_3 footnotes in p5, it is mainly based on the ISO standard and maps them to the pure math which may be more intuitive if having better mathematics knowledge.
(
x
meaning.)
Here ;
is composition of relations(i.e. means one pipeline like
[A] -> R -> [B]
).
This means one AB pair has the relation R.
happens-before definition:
non-math
See this (better than cppreference ones because of the 2 levels) which is referenced in both above 2 QAs.
A is dependency-ordered before B, or
This implies including consume.
As the paper_1 says:
Besides the new SC and NO -THIN - AIR conditions, RC11 differs in a few other ways from C11. It does not support consume accesses
So the above cppreference definition may be more general than the following math definitions from the paper_1.
math
Next, we say that one event happens before (hb) another event if they are connected by a sequence of
sb
orsw
edges
+
means transitive (i.e. sequence) and U
means "or".above S(k,m)
has the "happens-before" relation because of the S(l,m)
synchronization(This is due to l,m
have data dependency implied (i.e. m read the write of l))
sw
non-math
think as the rel
,acq
sequence like the other section of cppreference_1 says.
If an atomic store in thread A is a release operation, an atomic load in thread B from the same variable is an acquire operation, and the load in thread B reads a value written by the store in thread A, then the store in thread A synchronizes-with the load in thread B.
math (see the paper_1 for more details)
Next, a release event a synchronizes with (sw) an acquire event b, whenever b (or, in case b is a fence, some sb-prior read) reads from the release sequence of a (or in case a is a fence, of some sb-later write).
?
implies "or" relation.sc
from the bottom-right figure in p7 of the paper_1.[F]
is always explicity placed by weak architecture like POWERPC referenced in the paper_1 which is related with rel
or acq
,etc (see this link from the paper_2 for how the compiler adds these fences).rs; rf
is implied by rel
and acq
rs
rs
, rf; rmw
is to get the sequence like "write,read,write".rel
and acq
.definition of sb
. Here I take the definition in the paper_3 to highlight that they are in the same thread.
non-math
From cppreference_2:
Sequenced before is an asymmetric, transitive, pair-wise relationship between evaluations within the same thread.
math
Then sb
means:
a strict partial order (i.e., irreflexive and transitive) between events from the same thread, that captures the program order.
and the thd
:
an equivalence relation on non-initial events that relates events from the same thread.
I
:
a set of initial events. Every initial event
is a non-atomic write of zero; that is, kind (e) = Wna and wval (e) = 0.Moreover, there is exactly one initial event per location.
definition of the single total order (S
in the paper_1)
non-math
as the paper_1 says:
there should be a strict total order, S, over all SC events, intuitively corresponding to the order in which these events are executed.
math from the paper_3
the S relation relates, in a strict total order, all and only the SC events in an execution; that is,
id
is "identity relation"S
order is runtime defined.acy(S)
(S is acyclic) is significant because this caused the above example failure.relations of "the single total order" and "happens-before" and reasons for changes
original version in c++11 from the paper_1:
(S1) S must include hb restricted to SC events
(formally:
);
[...] (these with no problems found by the paper unchanged)
non-math: S
needs to be conform to the hb (happens-before) order
math: see the above equation.
why the original c++11 model fails with specific examples?
Because it drops of sync
fences. Then it implies the weaker memory model.
quotes in the paper_1 "Fixing the Model":
Conversely, if we follow the leading sync convention, the hb-path (in Fig. 3) from k to m ending with an sw edge avoids the fence placed before m.
And see paper_2:
On the other hand, this outcome is allowed by the Power implementation. Power normally uses the "leading fence" convention for sequentially consistent atomics. ( See http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html ) This means that there is only an lwsync fence between Thread 1's instructions. This does not have the "cumulativity"/transitivity properties that would be required to make the store to x visible to Thread 3 under these circumstances.
isync
which probably used at the end to ensure the right context and avoid "intra-process reordering" by refetching. I never programmed POWERPC, so I only offer the related reference links and it's not from my experiences).The above 2 quotes mean same about how drops of synchronization fences occurs. (i.e. hwsync
by "Store Seq Cst" by sc
is avoided and only lwsync
exists)
from the paper_1:
So, if requiring that hb (on SC events) be included in S is too strong a condition, what should we require instead?
non-math
The essential insight is that, according to either compilation scheme, we know that a sync fence will necessarily exist between SC accesses a and b if the hb path from a to b starts and ends with an sb edge. Second, if a and b access the same location, then the hardware will preserve the ordering anyway.
See above "[F]" about the sync fence.
(S1fix) S must relate any two SC events that are related by hb, provided that the hb-path between the two events either starts and ends with sb edges, or starts and ends with accesses to the same location
sb
(because this won't cause cycle by the program order) are allowed now in c++20 instead of the whole hb
(happens-before).math
(formally:
, where
denotes hb edges between accesses to the same location).
hb
-> sb...
(i.e. the total modification order S
doesn't take the whole hb
in account now in c++20)
sb;hb
is dropped. (view Figure 3: S(k,l)
is sb
) (Specifically to say, hb=sw
). So the old happens-before A,C
is not take in account now in the total modification order.
Upvotes: 0