Reputation: 657
The mfence
documentation says the following:
Performs a serializing operation on all load-from-memory and store-to-memory instructions that were issued prior the MFENCE instruction. This serializing operation guarantees that every load and store instruction that precedes the MFENCE instruction in program order becomes globally visible before any load or store instruction that follows the MFENCE instruction.
As far as I know, there is no fence instruction in x86 that prevents the reordering of non read and non write instructions.
Now if my program only have one thread, even if the instructions are reordered, it would still seem as if the instructions are executing in order.
But what if my program have multiple threads, and in one of the threads the non read and non write instructions are reordered, will the other threads notice this reordering (I assume the answer is No, or else there would be a fence instruction to stop the non read and non write instructions reordering, or maybe I'm missing something)?
Upvotes: 2
Views: 149
Reputation: 363999
will the other threads notice this reordering
No, other than performance (timing or direct measurement with HW performance counters). Or microarchitectural side-channels (like ALU port pressure for logical cores that share a physical core with Hyperthreading / SMT): one thread can time itself to learn something about what the other hardware thread is executing.
The only "normal" way for threads to observe anything about each other is by loading data that other threads stored.
Even load ordering is only visible indirectly (by the effect it has on what the other thread decides to later store).
As far as I know, there is no fence instruction in x86 that prevents the reordering of non read and non write instructions.
On Intel CPUs (but not AMD), lfence
does this. Intel's manual says so, this is not just an implementation detail. It's actually guaranteed for future microarchitectures.
Intel's LFENCE instruction-set reference manual entry:
LFENCE does not execute until all prior instructions have completed locally, and no later instruction begins execution until LFENCE completes.
(completed locally = retired from the out-of-order core, i.e. leaves the ROB).
lfence
is not particularly useful as an actual load barrier, because x86 doesn't allow weakly-ordered loads from WB memory (only from WC). (Not even movntdqa
or prefetchnta
can create weakly-ordered loads from normal WB memory.) So unlike sfence
, lfence
is basically never needed for memory ordering, only for its special effects, like lfence
; rdtsc
. Or for Spectre mitigation, to block speculative execution past it.
But as an implementation detail, on Intel CPUs including at least Skylake, mfence
is a barrier for out-of-order execution. See Are loads and stores the only instructions that gets reordered? for that, and much more related stuff.
Upvotes: 5