Reputation:
Many programming languages today have happens-before
relation and release+acquire
synchronization operations.
Some of these programming languages:
I would like to know if release+acquire
can violate happens-before
:
release+acquire
and happens-before
Release/acquire
establishes happens-before
relation between different threads: in other words everything before release
in Thread 1
is guaranteed to be visible in Thread 2
after acquire
:
\ Thread 1 /
\ -------- /
\ x = 1 / Everything
\ y = 2 / here...
\ write-release(ready = true) /
└───────────────────────────┘
|
└─────────────┐ (happens-before)
V
┌─────────────────────────┐
/ Thread 2 \
...is visible to / -------- \
everything / read-acquire(ready == true) \
here / assert(x == 1) \
/ assert(y == 2) \
More than that, happens-before
is a strict partial order.
This means it is:
Thread 2
is guaranteed to see not only writes made by Thread 1
, but also all writes by other threads that Thread 1
sawa
happens-before b
, or b
happens-before a
, both isn't allowedrelease/acquire
might break happens-before
As we know from IRIW litmus test, release/acquire
could cause two threads to see writes from different threads in different order (for C++ see also the last example here, and these two examples from gcc wiki):
// Thread 1
x.store(1, memory_order_release);
// Thread 2
y.store(1, memory_order_release);
// Thread 3
assert(x.load(memory_order_acquire) == 1 && y.load(memory_order_acquire) == 0)
// Thread 4
assert(y.load(memory_order_acquire) == 1 && x.load(memory_order_acquire) == 0)
Here both assert
s can pass, which means that Thread 3
and Thread 4
see writes to x
and y
in different order.
As I understand, if it were ordinary variables, then this would violate the asymmetry property of happens-before.
But because x and y are atomics it's OK.
(BTW I am not sure about that)
Nate Eldredge demonstrated in his answer that this IRIW example is OK.
But I still have a sneaking suspicion that there might be something similar to IRIW which would cause Thread 3
and Thread 4
to see regular writes to happen-before in different order — this would break happens-before (it wouldn't be transitive anymore).
In cppreference there is also this quote:
The implementation is required to ensure that the happens-before relation is acyclic, by introducing additional synchronization if necessary (it can only be necessary if a consume operation is involved, see Batty et al)
The quote hints that there might be cases when happens-before
is violated and which require additional synchronization ("acyclic" means that happens-before forms a directed acyclic graph, which is equivalent to saying "strict partial order").
If it's possible I would like to know what are these cases.
Since java allows data races, I'm also interested in cases when happens-before
is violated only in presence of data races.
To give you an example, here is a explanation why sequentially consistent (SC) atomics cannot violate happens-before
.
(A similar explanation for release/acquire atomics would be an answer to my question).
By "violate happens-before
" I mean "to violate axioms of happens-before
, which is a strict partial order".
Strict partial orders correspond directly to directed acyclic graphs (DAGs).
Here is an example of a DAG from wiki (note that it has no cycles):
Let's show that with SC atomics happens-before
graph stays acyclic.
Remember that SC atomics happen in a global order (the same for all threads), and:
Look at this happens-before
graph:
Thread1 Thread2 Thread3
======= ======= =======
│ │ │
W(x) │ │
↓ │ │
Sw(a) ┐ │ W(y)
│ │ │ ↓
│ └> Sr(a) ┌ Sw(b)
│ ↓ │ │
│ Sr(b)<─┘ │
│ ↓ │
│ R(x) │
│ ↓ │
│ R(y) │
│ │ │
V V V
On the graph:
W(x)
and R(x)
are regular actions: write and read of x
Sw(a)
and Sr(a)
are SC atomics: write and read of a
sequenced-before order
in C++): in the order they go in the codehappens-before
is established by SC atomicsNote that arrows on the graph always go downwards
=> the graph cannot have cycles
=> it's always a DAG
=> happens-before
axioms cannot be violated
The same proof doesn't work for release/acquire
atomics because (as far as I understand) they don't happen in a global order => a HB arrow between Sw(a)
and Sr(a)
might go upwards => a cycle is possible. (I'm not sure about that)
Upvotes: 5
Views: 686
Reputation:
Answer: no, release/acquire cannot break happens-before.
The proof is given by Nate Eldredge in this comment:
Oh, indeed. And actually, I might see how to prove it. The HB relation is transitive by its construction. The sequencing relation is acyclic, so if there is a cycle in HB, it must contain at least one synchronizes-with step, which is, roughly, an acquire load L that takes its value from a release store S. But because of the cycle, L happens before S. Therefore by p14, L must take its value from some store other than S, contradiction. Not quite a proof because I have oversimplified the definition of synchronizes-with from atomics.order p2, but it's a start.
I just would like to put it in a separate answer to make it easier for people to notice.
Plus here are additional explanations from me (maybe it will make it easier to understand for some people).
First of all, if we use only release/acquire
atomics and non-atomic memory accesses, then happens-before
is transitive (and acyclic) by construction.
Similarly to SC
graph, in case of release/acquire
edges also always point downwards:
Thread1 Thread2 Thread3 Thread4
======= ======= ======= =======
│ │ │ │
│ ┌ Wrel(a)┐ │ │
Racq(a)<┘ │ │┌ Wrel(b)┐ │
↓ │ ││ │ └> Racq(b)
Racq(b)<─────│───│┘ │ ↓
│ │ └─────│────> Racq(a)
│ │ │ │
V V V V
(BTW notice that this graph is different from SC
: Thread 1
and Thread 4
see Wrel(a)
and Wrel(b)
in different orders. But edges point downwards nonetheless)
Edges from Wrel(x)
to Racq(x)
always point downwards because Racq(x)
sees Wrel(x)
and everything before Wrel(x)
as completed (See Notes in the end of the answer).
(In C++ spec this is called synchronizes-with
, you can learn more about it here.)
As a result, similarly to SC
graph, all edges always point downwards
=> the graph cannot have cycles
=> it's always a DAG
=> happens-before
axioms cannot be violated
Actually happens-before
produced by release/acquire
atomics — is basically the original happens-before
introduces by Leslie Lamport in Time, Clocks and the Ordering of Events in a Distributed System.
I really recommend to read the paper to everyone interested in HB
— Lamport's explanations are short and clear, and the idea is really cool.
Let's also demonstrate with picture why cycles aren't possible.
This a what a cycle looks like:
Thread1 Thread2 Thread3
======= ======= =======
│ │ │
Racq(a)<─────│──────────│─────┐
↓ │ │ │
Wrel(b) ┐ │ │ │
│ │ │ │ │
│ └> Racq(b) │ │
│ ↓ │ │
│ Wrel(c) ┐ │ │
│ │ │ │ │
│ │ └> Racq(c) │
│ │ ↓ │
│ │ Wrel(a) ┘
│ │ │
V V V
Within each thread happens-before
is the order of actions in the source code (it' called sequenced-before
in C++ and program order
in Java).
Clearly, no HB
cycles are possible here.
It means that the edge which "goes back" and closes the cycle must be a synchronizes-with
edge like the Wrel(a)->Racq(a)
edge in the graph above.
Notice a contradiction:
Wrel(a)
must complete before Racq(a)
, because Racq(a)
reads the value written by Wrel(a)
Racq(a)
must complete before Wrel(a)
, because there is a chain Racq(a)->Wrel(b)->Racq(b)->Wrel(c)->Racq(c)->Wrel(a)
where every Wrel(x)
(and everything before it) completes before Racq(x)
reads it.This means the Wrel(a)->Racq(a)
edge is not allowed => cycles aren't possible.
In terms of the C++ memory model it violates coherence requirements:
The value of an atomic object M, as determined by evaluation B, shall be the value stored by some side effect A that modifies M, where B does not happen before A.
Notes.
I stated that:
Racq(x)
seesWrel(x)
and everything beforeWrel(x)
as completed
but in C++ standard that isn't stated directly.
Instead it has this:
happens-before
is what defines what a read seesRacq(x)
and Wrel(x)
is called synchronizes-with
happens-before
is built from synchronizes-with
and huge amount of other rules and ordersIt's possible to deduce my statement from the C++ standard, although it might not be easy. (That is why I recommend to read this article instead).
I used the statement because it concisely describes what memory barrier instructions do, and this is how happens-before
can be (and probably is) easily implemented .
So often a memory barrier instruction is all we need to implement happens-before
with all its mathematical properties.
For an overview of these instructions on different CPUs I would recommend the section on that in Memory Barriers: a Hardware View for Software Hackers by Paul E. McKenney.
(For example, memory barriers in PowerPC basically work the same as release/acquire
atomics in C++)
Upvotes: 0
Reputation: 21
The following fragment of the accepted answer has an error:
Thread1 Thread2 Thread3 ======= ======= ======= │ │ │ │ ┌ Wrel(a)──┐ │ │ │ │ │ │ Racq(a)<┘ │ │ │ │ ↓ │ │ │ ┌ Wrel(b) ┐│ │ ↓ │ │ ││ │ Racq(b)<┘ │ └─> Racq(b) │ │ │ ↓ │ │ └> Racq(a) │ │ │ V V V
(BTW notice that this graph is different from
SC
:Thread 1
andThread 3
seeWrel(a)
andWrel(b)
in different orders. But edges point downwards nonetheless)
Thread 1
and Thread 3
can see Wrel(a)
and Wrel(b)
in different orders only when Wrel(a)
and Wrel(b)
are from different threads.
This is so called IRIW (independent reads of independent writes) test.
So the fragment should be changed to this:
Thread1 Thread2 Thread3 Thread4 ======= ======= ======= ======= │ │ │ │ │ ┌ Wrel(a)┐ │ │ Racq(a)<┘ │ │┌ Wrel(b)┐ │ ↓ │ ││ │ └> Racq(b) Racq(b)<─────│───│┘ │ ↓ │ │ └─────│────> Racq(a) │ │ │ │ V V V V
(BTW notice that this graph is different from
SC
:Thread 1
andThread 4
seeWrel(a)
andWrel(b)
in different orders. But edges point downwards nonetheless)
By the way, I could have made this a suggested edit for the accepted answer.
I didn't do it because it's a pretty specialized topic, so the reviewer might not know all the nuances and details, and reject the edit simply because it's not obvious that it really fixes an error.
But I think this is an error worth mentioning and fixing. So I put the fix in this answer, because:
Upvotes: 2
Reputation: 58673
Happens-before is the transitive closure of sequenced-before and synchronizes-with. Sequenced-before is just program order within each thread, and synchronizes-with occurs when an acquire load takes its value from a release store. So in your program, in order to have both assertions pass, the following relations must hold:
T3.x==1
happens before T3.y==0
(sequencing)T4.y==1
happens before T4.x==0
(likewise)T1.x=1
happens before T3.x==1
(acquire load takes its value from release store)T2.y=1
happens before T4.y==1
(likewise)T1.x=1
happens before T3.y==0
(for transitivity)T2.y=1
happens before T4.x==0
(likewise)You may check that this satisfies all the axioms of a partial ordering (antisymmetric and transitive) as well as all the C++ coherence rules implied by both assertions passing. For instance, it must not be the case that T2.y=1
happens before T3.y==0
, and indeed there is no such relation in our ordering. But it is also not true that T3.y==0
happens before T2.y=1
, and there is nothing wrong with that; it is a partial order after all. T2.y=1
and T3.y==0
are simply unordered.
Since there is a valid happens-before ordering that is consistent with both assertions passing, it is possible that when you run the program, both assertions pass.
It is true that if some happens-before relation existed between T3.y==0
and T2.y=1
in either direction, and likewise between T4.x==0
and T1.x=1
, then every combination would result in some violation of the rules: either a violation of coherence, or a cycle in the partial order. But again, it is perfectly fine for them to be unordered, and then no violation occurs.
If the loads and stores were all relaxed, or even if x
and y
were not atomic at all, then relations 3 and 4 above would not be implied by any rule, and so the happens-before ordering would become simply:
T3.x==1
happens before T3.y==0
(sequencing)T4.y==1
happens before T4.x==0
(likewise)This is also consistent with both assertions passing. (In the non-atomic case, the fact that T1.x=1
is unordered with the loads of x
would mean there is a data race, and therefore the behavior is undefined in C++. In a different language like Java, we might have defined behavior saying that both loads succeed and return either 0 or 1, but we could still have both assertions pass.) If you think that changing the program to use non-atomic loads and stores would prevent both assertions from passing, you are mistaken.
So having acquire and release actually strengthens the ordering; there are more relations that must hold, and the behavior of the program becomes better defined.
Upvotes: 5
Reputation: 10882
I would like to know if release+acquire can violate happens-before.
Happens-before relationship cannot be "violated", as it is a guarantee. Meaning, if you established it in a correct way, it will be there, with all its implications (unless there is a bug in the compiler).
However, establishing just any happens-before relationship doesn't guarantee that you've avoided all possible race conditions. You need to establish carefully chosen relationships between relevant operations, that will eliminate all scenarios when data race is possible.
Let's review this code snippet:
// Thread 1
x.store(1, std::memory_order_release);
// Thread 2
y.store(1, std::memory_order_release);
// Thread 3
while (x.load(std::memory_order_acquire) != 1);
if (y.load(std::memory_order_acquire) == 0) {
x_then_y = true;
}
// Thread 4
while (y.load(std::memory_order_acquire) != 1);
if (x.load(std::memory_order_acquire) == 0) {
y_then_x = true;
}
As you see, T1
and T2
update x
and y
using memory_order_release
memory ordering. T3
and T4
use memory_order_acquire
.
Now, to quote cppreference:
Release-Acquire ordering If an atomic store in thread
A
is taggedmemory_order_release
and an atomic load in threadB
from the same variable is taggedmemory_order_acquire
, all memory writes (non-atomic and relaxed atomic) that happened-before the atomic store from the point of view of threadA
, become visible side-effects in threadB
. That is, once the atomic load is completed, threadB
is guaranteed to see everything threadA
wrote to memory.The synchronization is established only between the threads releasing and acquiring the same atomic variable. Other threads can see different order of memory accesses than either or both of the synchronized threads.
It explicitly says, that only relations between the store-load
pairs in corresponding threads is established. No relation between load-load
pairs is formed. Back to our example, the happens-before relations would form the following graph:
As you can see, there is no relation between T1 and T2, or T3 and T4, and no total order. T3 and T4 are free to see side-effects from T1 and T2 in any order.
That code snippet above was used in cppreference
to illustrate the case, when memory_order_acquire
+ memory_order_release
may be too weak to avoid data race (but that doesn't mean that this example violates happens-before or memory model, it's just that having happens-before may not be sufficient to avoid all possible data races!).
If memory_order_seq_cst
was used for all operations in the example above, then the happens-before would've been additionally established between load
operations in T3 and T4.
Meaning, that when first while(x.load) != 1
or while(y.load) != 1
passes, it guarantees that following x.load
or y.load
in another thread will have the result of 1
, ensuring that x_then_y
and y_then_x
cannot be true in the same time.
Note, in java atomic and volatile
operations always work similarly to memory_order_seq_cst
for simplicity (sacrificing performance in some cases).
Upvotes: 0