user8426277
user8426277

Reputation: 657

Does it matter if the non read and non write instructions are reordered in x86?

The mfence documentation says the following:

Performs a serializing operation on all load-from-memory and store-to-memory instructions that were issued prior the MFENCE instruction. This serializing operation guarantees that every load and store instruction that precedes the MFENCE instruction in program order becomes globally visible before any load or store instruction that follows the MFENCE instruction.

As far as I know, there is no fence instruction in x86 that prevents the reordering of non read and non write instructions.

Now if my program only have one thread, even if the instructions are reordered, it would still seem as if the instructions are executing in order.

But what if my program have multiple threads, and in one of the threads the non read and non write instructions are reordered, will the other threads notice this reordering (I assume the answer is No, or else there would be a fence instruction to stop the non read and non write instructions reordering, or maybe I'm missing something)?

Upvotes: 2

Views: 149

Answers (1)

Peter Cordes
Peter Cordes

Reputation: 363999

will the other threads notice this reordering

No, other than performance (timing or direct measurement with HW performance counters). Or microarchitectural side-channels (like ALU port pressure for logical cores that share a physical core with Hyperthreading / SMT): one thread can time itself to learn something about what the other hardware thread is executing.

The only "normal" way for threads to observe anything about each other is by loading data that other threads stored.

Even load ordering is only visible indirectly (by the effect it has on what the other thread decides to later store).


As far as I know, there is no fence instruction in x86 that prevents the reordering of non read and non write instructions.

On Intel CPUs (but not AMD), lfence does this. Intel's manual says so, this is not just an implementation detail. It's actually guaranteed for future microarchitectures.

Intel's LFENCE instruction-set reference manual entry:

LFENCE does not execute until all prior instructions have completed locally, and no later instruction begins execution until LFENCE completes.

(completed locally = retired from the out-of-order core, i.e. leaves the ROB).

lfence is not particularly useful as an actual load barrier, because x86 doesn't allow weakly-ordered loads from WB memory (only from WC). (Not even movntdqa or prefetchnta can create weakly-ordered loads from normal WB memory.) So unlike sfence, lfence is basically never needed for memory ordering, only for its special effects, like lfence ; rdtsc. Or for Spectre mitigation, to block speculative execution past it.


But as an implementation detail, on Intel CPUs including at least Skylake, mfence is a barrier for out-of-order execution. See Are loads and stores the only instructions that gets reordered? for that, and much more related stuff.

Upvotes: 5

Related Questions