Reputation: 64925
On x86, lock
-prefixed instructions such as lock cmpxchg
provide barrier semantics in addition to their atomic operation: for normal memory access on write-back memory regions, reads and writes are not re-ordered across lock
-prefixed instructions, per section 8.2.2 of Volume 3 of the Intel SDM:
Reads or writes cannot be reordered with I/O instructions, locked instructions, or serializing instructions.
This section applies only to write-back memory types. In the same list, you find an exception where it notes that weakly ordered stores are not ordered:
- Reads are not reordered with other reads.
- Writes are not reordered with older reads.
- Writes to memory are not reordered with other writes, with the following exceptions: —
streaming stores (writes) executed with the non-temporal move instructions (MOVNTI, MOVNTQ, MOVNTDQ, MOVNTPS, and MOVNTPD); and —
string operations (see Section 8.2.4.1).
Note that there is no exception made for non-temporal instructions in any other items in the list, e.g., in the item referring to lock-prefixed instructions.
In various other sections of the guide, it is mentioned that the mfence
and/or sfence
instructions can be used to order memory when weakly ordered (non-temporal) instructions are used. These sections generally don't mention lock
-prefixed instruction as an alternative.
All that leaves me uncertain: do lock
-prefixed instructions provide the same full barrier that mfence
provides between weakly ordered (non-temporal) instructions on WB memory? The same question applies again but to any type of access on WC memory.
Upvotes: 6
Views: 1062
Reputation: 23669
On all 64-bit AMD processors, MFENCE
is a fully serializing instruction and the Lock-prefixed instructions are not. However, both serialize all memory accesses according to the AMD manual V2 7.4.2:
All previous loads and stores complete to memory or I/O space before a memory access for an I/O, locked or serializing instruction is issued.
All loads and stores associated with the I/O and locked instructions complete to memory (no buffered stores) before a load or store from a subsequent instruction is issued.
There are no exceptions or erratum related to the serialization properties of these instructions.
It's clear from the Intel manual and documents that both serialize all stores with no exceptions or related erratum. MFENCE
also serializes all loads, with one errata documented for most processors based on Skylake, Kaby Lake, and Coffee Lake microarchitectures, which states that MOVNTDQA
from WC memory may passs earlier MFENCE
instructions. In addition, many processors based on the Nehalem, Sandy Bridge, Ivy Bridge, Haswell, Broadwell, Skylake, Kaby Lake, Coffee Lake, and Silvermont microarchitectures have an errata that says that MOVNTDQA
from WC memory may passs earlier locked instructions. Processors based on the Core, Westmere, Sunny Cove, and Goldmont microarchitectures don't have this errata.
The quote from Necrolis's answer says that the lock prefix may not serialize load operations that reference weakly ordered memory types on the Pentium 4 processors. My understanding is that this looks like a bug in the Pentium 4 processors and it doesn't apply to any other processors. Although it's worth noting that it's not documented in the spec update documents of the Pentium 4 processors.
@PeterCordes's experiments show that, on Skylake, locking instructions don't seem to block ALU instructions from being executed out-of-order while mfence
does serialize ALU instructions (potentially behaving identically to lfence
+ a store-buffer flush like a locked instruction). However, I think this is an implementation detail.
Upvotes: 7
Reputation: 26171
Bus locks (via the LOCK
opcode prefix) produce a full fence*, however, on WC memory they don't provide the load fence, this is documented in the Intel 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A, 8.1.2:
For the P6 family processors, locked operations serialize all outstanding load and store operations (that is, wait for them to complete). This rule is also true for the Pentium 4 and Intel Xeon processors, with one exception. Load operations that reference weakly ordered memory types (such as the WC memory type) may not be serialized.
*See Intel's 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A, 8.2.3.9 for an example
Upvotes: 3