Understanding cmpxchg8b/cmpxchg16b operation

Question

The SDM text for this instruction has the following block:

This instruction can be used with a LOCK prefix to allow the instruction to be executed atomically. To simplify the interface to the processor’s bus, the destination operand receives a write cycle without regard to the result of the comparison. The destination operand is written back if the comparison fails; otherwise, the source operand is written into the destination.

I have trouble understanding the last sentence (but also probably the whole pragraph)

The destination operand is written back ... back to what?
...;otherwise, the source operand is written into the destination what is the source operand? Is it the ECX:EBX ? As far as I can understand, this CAS instruction only takes one operand (the memory destination).

Would appreciate if someone could rephrase and/or explain this bit about the unconditional write.

Peter Cordes · Accepted Answer

Compare with the wording for regular cmpxchg r/m32, r32 (which has an explicit instead of implicit source) and it should make more sense, especially compare the short-description in the table of forms at the top of the manual entry. I've annotated with dst, src, and implicit. Note that Intel syntax in general is op dst, src.

cmpxchg r/m64, r64: Compare RAX (implicit) with r/m64 (dst). If equal, ZF is set and r64 (src) is loaded into r/m64 (dst). Else, clear ZF and load r/m64 (dst) into RAX (implicit).
cmpxchg16b m128 Compare RDX:RAX with m128 (dst). If equal, set ZF and load RCX:RBX into m128 (dst). Else, clear ZF and load m128 into RDX:RAX.

Yes that's right, Intel's manual uses "loaded" to describe a store to memory. (Slightly justifiable for cmpxchg where the destination can be a register, not at all for cmpxchg16b.)

But anyway, it can help to keep in mind that these implement:

m64.compare_exchange_strong(expected=RAX, desired=r64);
m128.compare_exchange_strong(expected=RDX:RAX, desired=RCX:RBX);

(In terms of C++ std::atomic. To actually be atomic they require the lock prefix, otherwise it's a non-atomic RMW. C++ will only ever compile to lock cmpxchg / lock cmpxchg16b, never an un-locked cmpxchg with mainstream compilers.)

The destination operand is written back ... back to what?

The old value of the destination (which was just loaded) is written back. This means cmpxchg16b is always a write, and will e.g. always mark the page's dirty flag as dirty. (Does cmpxchg write destination cache line on failure? If not, is it better than xchg for spinlock? asks if it truly microarchitecturally dirties the cache line on CAS failure. I assume so, but haven't checked.)

This was historically important for the lock prefix on older CPUs, where there was an external LOCK# pin that lock cmpxchg actually asserted for the whole load+store pair. Modern CPUs just hold onto a cache lock on the affected cache line for the duration, for aligned lock CAS on cacheable memory. That's why the manual says "To simplify the interface to the processor’s bus, the destination operand receives a write cycle without regard to the result of the comparison."

The destination operand is written back if the comparison fails; otherwise, the source operand is written into the destination. (The processor never produces a locked read without also producing a locked write.)

This whole paragraph was copy-pasted from the cmpxchg manual entry when Intel was writing the cmpxchg16b entry; it's less clear in the CX16 context because that has 2 implicit operands instead of an explicit source and a read-write RAX. It doesn't define the term "source operand".

Earlier in the description it does define the "destination operand" term for that instruction

Compares the 64-bit value in EDX:EAX (or 128-bit value in RDX:RAX if operand size is 128 bits) with the operand (destination operand)

"the operand" meaning the explicit operand. This is obviously what's meant because it's the only thing that can be memory, so it must be one of the things being compared. And also other clues / reasons from how English works and so on.

So "destination operand" does get clearly defined, but it's poor writing to say "source operand" without defining it, in an instruction with 3 total operands. Like I said, clearly a result of copy/pasta by Intel's documentation writers.

It's not a serious problem; we know the basic point of the instruction and the Operation section makes it 100% clear what actually happens.

Understanding cmpxchg8b/cmpxchg16b operation

Answers (1)

Related Questions