Reputation: 5073
Let's consider the following two-thread concurrent program in C++:
x,y
are globals, r1,r2
are thread-local, store
and load
to int
is atomic.
Memory model = C++11
int x = 0, int y = 0
r1 = x | r2 = y
y = r1 | x = r2
A compiler is allowed to compile it as:
int x = 0, int y = 0
r1 = x | r2 = 42
y = r1 | x = r2
| if(y != 42)
| x = r2 = y
And, while it is intra-thread consistent, it can result in wild results, because it is possible that execution of that program results in (x, y) = (42, 42)
It is called Out of Thin Air values problem. And it exists and we have to live with that.
My question is: Does a memory barrier prevent a compiler from doing wild optimizations that result in out-of-thin-air values?
For example:
[fence] = atomic_thread_fence(memory_order_seq_cst);
int x = 0, int y = 0
r1 = x | r2 = y
[fence] | [fence]
y = r1 | x = r2
Upvotes: 4
Views: 3678
Reputation: 364039
Related: my answer on What formally guarantees that non-atomic variables can't see out-of-thin-air values and create a data race like atomic relaxed theoretically can? explains in more details that the formal rules of the C++ relaxed atomic memory model don't exclude "out of thin air" values. But they do exclude them in a note. This is a problem only for formal verification of programs using mo_relaxed
, not for real implementations. Even non-atomic variables are safe from this, if you avoid undefined behaviour (which you didn't in the code in this question).
You have data race Undefined Behaviour on x
and y
because they're non-atomic
variables, so the C++11 standard has absolutely nothing to say about what's allowed to happen.
It would be relevant to look at this for older language standards without a formal memory model where people did threading anyway using volatile
or plain int
and compiler + asm barriers, where behaviour could depend on compilers working the way you expect in a case like this. But fortunately the bad old days of "happens to work on current implementations" threading are behind us.
Barriers are not helpful here with nothing to create synchronization; as @davmac explains, nothing requires the barriers to "line up" in the global order of operations. Think of a barrier as an operation that makes the current thread wait for some or all of its previous operations to become globally visible; barriers don't directly interact with other threads.
Out-of-thin-air values is one thing that can happen as a result of that undefined behaviour; the compiler is allowed to do software value-prediction on non-atomic variables, and invent writes to objects that will definitely be written anyway. If there was a release-store, or a relaxed store + a barrier, the compiler might not be allowed to invent writes before it, because that could create
In general from a C++11 language-lawyer perspective, there's nothing you can do to make your program safe (other than a mutex or hand-rolled locking with atomics to prevent one thread from reading x
while the other is writing it.)
Except maybe defeating auto-vectorization and stuff, if you were counting on other uses of this variable being aggressively optimized.
atomic_int x = 0, y = 0
r1 = x.load(mo_relaxed) | r2 = y.load(mo_relaxed)
y.store(r1, mo_relaxed) | x.store(r2, mo_relaxed)
Value-prediction could speculatively get a future value for r2
into the pipeline before thread 2 sees that value from y
, but it can't actually become visible to other threads until the software or hardware knows for sure that the prediction was correct. (That would be inventing a write).
e.g. thread 2 is allowed to compile as
r2 = y.load(mo_relaxed);
if (r2 == 42) { // control dependency, not a data dependency
x.store(42, mo_relaxed);
} else {
x.store(r2, mo_relaxed);
}
But as I said, x = 42;
can't become visible to other threads until it's non-speculative (hardware or software speculation), so value prediction can't invent values that other threads can see. The C++11 standard guarantees that atomics
I don't know / can't think of any mechanism by which a store of 42
could actually be visible to other threads before the y.load
saw an actual 42. (i.e. LoadStore reordering of a load with a later dependent store). I don't think the C++ standard formally guarantees that, though. Maybe really aggressive inter-thread optimization if the compiler can prove that r2
will always be 42 in some cases, and remove even the control dependency?
An acquire-load or release-store would definitely be sufficient to block causality violations. This isn't quite mo_consume
, because r2
is used as a value, not a pointer.
Upvotes: 3
Reputation: 20631
Not by itself. In your example, there is nothing synchronising the two threads. In particular, the fence in both thread do not cause the threads to synchronise at that point; For example, you might get the following sequence:
(Thread #1) | (Thread #2)
r1 = x |
[fence] |
y = junk temporary |
| r2 = y // junk!
| [fence]
| x = r2
y = r1 |
The simplest way to avoid out-of-thin-air results is to use atomic integers: if x and y atomic then they cannot have "out of thin air" values:
std::atomic_int x = 0, y = 0;
int r1 = x; | int r2 = y;
y = r1; | x = r2;
Upvotes: 0