Reputation: 6008
In a multi-producer, multi-consumer situation. If producers are writing into int a
, and consumers are reading from int a
, do I need memory barriers around int a
?
We all learned that: Shared resources should always be protected and the standard does not guarantee a proper behavior otherwise.
However on cache-coherent architectures visibility is ensured automatically and atomicity of 8, 16, 32 and 64 bit variables MOV
operation is guaranteed.
Therefore, why protect int a
at all?
Upvotes: 3
Views: 309
Reputation: 490188
At least in C++11 (or later), you don't need to (explicitly) protect your variable with a mutex or memory barriers.
You can use std::atomic
to create an atomic variable. Changes to that variable are guaranteed to propagate across threads.
std::atomic<int> a;
// thread 1:
a = 1;
// thread 2 (later):
std::cout << a; // shows `a` has the value 1.
Of course, there's a little more to it than that--for example, there's no guarantee that std::cout
works atomically, so you probably will have to protect that (if you try to write from more than one thread, anyway).
It's then up to the compiler/standard library to figure out the best way to handle the atomicity requirements. On a typical architecture that ensures cache coherence, it may mean nothing more than "don't allocate this variable in a register". It could impose memory barriers, but is only likely to do so on a system that really requires them.
On real world C++ implementations where volatile
worked as a pre-C++11 way to roll your own atomics (i.e. all of them), no barriers are needed for inter-thread visibility, only for ordering wrt. operations on other variables. Most ISAs do need special instructions or barriers for the default memory_order_seq_cst
.
On the other hand, explicitly specifying memory ordering (especially acquire
and release
) for an atomic variable may allow you to optimize the code a bit. By default, an atomic uses sequential ordering, which basically acts like there are barriers before and after access--but in a lot of cases you only really need one or the other, not both. In those cases, explicitly specifying the memory ordering can let you relax the ordering to the minimum you actually need, allowing the compiler to improve optimization.
(Not all ISAs actually need separate barrier instructions even for seq_cst
; notably AArch64 just has a special interaction between stlr
and ldar
to stop seq_cst stores from reordering with later seq_cst loads, on top of acquire and release ordering. So it's as weak as the C++ memory model allows, while still complying with it. But weaker orders, like memory_order_acquire
or relaxed
, can avoid even that blocking of reordering when it's not needed.)
Upvotes: 7
Reputation: 88175
However on cache-coherent architectures visibility is ensured automatically and atomicity of 8, 16, 32 and 64 bit variables MOV operation is guaranteed.
Unless you strictly adhere to the requirements of the C++ spec to avoid data races, the compiler is not obligated to make your code function the way it appears to. For example:
int a = 0, b = 0; // shared variables, initialized to zero
a = 1;
b = 1;
Say you do this on your fully cache-coherent architecture. On such hardware it would seem that since a
is written before b
no thread will ever be able to see b
with a value of 1 without a
also having that value.
But this is not the case. If you have failed to strictly adhere to the requirements of the C++ memory model for avoiding data races, e.g. you read these variables without the correct synchronization primitives being inserted anywhere, then your program may in fact observe b
being written before a
. The reason is that you have introduce "undefined behavior" and the C++ implementation has no obligation to do anything that makes sense to you.
What may be going on in practice, is that the compiler may reorder writes even if the hardware works very hard to make it seem as if all writes occur in the order of the machine instructions performing the writes. You need the entire toolchain to cooperate, and cooperation from just the hardware, such as strong cache coherency, is not sufficient.
The book C++ Concurrency in Action is a good source if you care to learn about the details of the C++ memory model and writing portable, concurrent code in C++.
Upvotes: 5