Reputation: 137
I am reading a bit about the MESI protocol for cache coherance. I have read that atomic operations in x86-64 such as XCHG acqure the cache line in exclusive mode.
But according to the protocol, the cache line can transition to share or invalid state if another core reads or writes to a momory location in that cache line. So can this happen while a core is executing an atomic operation? And how is it prevented?
Upvotes: 1
Views: 520
Reputation: 2236
In addition to the MESI states, all (?) cache coherence protocols have "transient" states that are used while transitions among the MESI states are taking place. For example, when a cache requests an S to M transition, the requesting cache has to wait until all of the other caches (or equivalent directories) acknowledge that they have invalidated the cache line before the M state can be granted. During this interval, other transactions referencing the transient cache line have to be deferred -- otherwise a cache would never be able to complete an "upgrade" transaction on a cache line that other cores are reading. Atomic operations require a read and an update to the same line without allowing any other agent to operate on the middle of the transactions. Perhaps the most straightforward way to implement this is to extend one or more transient states to "protect" the cache line during the read+write transaction.
Upvotes: 1
Reputation: 363980
The CPU core that owns that line simply chooses not to process and respond to requests to share or invalidate that line until after the atomic RMW operation has completed.
The detailed mechanism in modern CPUs is probably based on microcode: One of the uops for xchg [mem], reg
probably does a special kind of load that "locks" that cache line (once it's exclusively owned in this cores L1d if it wasn't already), and one of the final uops does a special kind of store that also "unlocks" it, for this internal locking mechanism that's only usable by microcode.
(Opening that up to separate x86 instructions locking and unlocking would create the possibility of deadlocking the system. Making it internal to one instruction's microcode can ensure that the max lock-hold time is very low and can't be broken up by an interrupt.)
Related: I wrote a more general answer about x86 atomic RMW operations on Can num++ be atomic for 'int num'?
Upvotes: 2