compiler reordering vs memory reordering

Question

Under gcc there are the followings instructions available for set a memory barrier. They both provide different "protection"

asm volatile("" ::: "memory"); // compiler reorder
asm volatile("mfence" ::: "memory"); // memory reordering

C++ atomic provide in short :

- acquire/release semantics
- Sequentially-consistent ordering

I'm wondering if there is a direct mapping between gcc primitive and C++ atomic semantics ? (for instance (that must be wrong,it's just for explanation purpose) , acquire/release semantics is to prevent against compiler reordering and Sequentially-consistent ordering is to prevent memory reordering)

Or maybe C++ doesn't do this difference ? the language offer only semantics which apply to both reordering in the same time ?

Leeor · Accepted Answer

The first barrier only applies during compilation. Once the compiler is done, it has no impact since nothing is added to the code. This could be useful to avoid some memory ordering issues (the compiler doesn't know how other threads may manipulate these memory locations, although hardly any compiler with normal settings would dare reorder variables with a potential for that).

However, this is far from enough since on modern out-of-order CPUs the hardware itself may reorder operations under the hood. To avoid that, you have ways to tell the HW to watch out, given the exact level and form of restriction you want to achieve (with sequential consistency being the most restrictive and "safe" ordering model, but usually also the most expensive in terms of performance).

To achieve these restrictions, you can either try manually maintaining barriers and similar constructs that the ISA provides (through intrinsics, inline assembly, serializing operations, or any other trick). This is usually complicated even if you know what you're doing, and may even be micro-architectural specific (some CPUs may grant some restrictions "for free", making explicit fencing useless), so c++11 added the atomic semantics to make this task easier, and now the compiler adds the necessary code for you depending on the specified ordering model you want.

In your example, the mfence is an example of doing things manually, but you also need to know where to apply it. Used correctly, the mfence can be string enough to provide seq consistency, but is also very expensive since it includes a store-fence (mfence = sfence + lfence), which requires draining all pending stores from the internal buffers, a slow operation since the buffering is done to allow them a lazy commit. On the other hand, if you want acquire/release semantics, you can chose to implement them with proper partial fences at the correct places considering your architecture, or let the compiler do that for you. If you choose the latter and run over an x86 machine for example, you'll discover that most of the times nothing needs to be added since stores have implicit release semantics and loads have acquire semantics, but the same may not apply on other architectures.

Here's a nice summary of the implementation of various ordering semantics per architecture - http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html

compiler reordering vs memory reordering

Answers (1)

Related Questions