FrozenHeart
FrozenHeart

Reputation: 20756

How does CAS instructions guarantee atomicity

According to Wiki, CAS do something like this:

function cas(p : pointer to int, old : int, new : int) returns bool {
    if *p ≠ old {
        return false
    }
    *p ← new
    return true
}

Well, it seems for me that if several processors will try to execute CAS instruction with the same arguments, there can be several write attempts at the same time so it's not safe to do it anyway.

Where am I wrong?

Upvotes: 6

Views: 3739

Answers (2)

Peter Cordes
Peter Cordes

Reputation: 364573

Atomic read-compare-write instructions from multiple cores at the same time (on the same cache line) do contend with each other, but it's up to hardware to sort that out. Hardware arbitration of atomic RMW instructions is a real thing in modern CPUs, and provides some degree of fairness so that one thread spinning on lock cmpxchg can't totally block other threads doing the same thing.

(Although that's a bad design unless your retry could succeed without waiting for another thread to modify anything, e.g. a retry loop that implements fetch_or or similar can try again with the updated value of expected. But if waiting for a lock or flag to change, after the initial CAS fails, it's better to spin on an acquire or relaxed load and only do the CAS if it might succeed.)

There's no guarantee what order they happen in, which is why you need to carefully design your algorithm so that correctness only depends on that compare-and-exchange being atomic. (The ABA problem is a common pitfall).


BTW, that entire block of pseudocode happens as a single atomic operation. Making a read-compare-write or read-modify-write happen as a single atomic operation is much harder for the hardware than just stores, which MESIF/MOESI handle just fine.

are you sure? I thought that it's unsafe to do that because, for example, x86 doesn't guarantee atomicity of writes for non-aligned DWORDs

lock cmpxchg makes the operation atomic regardless of alignment. It's potentially a lot slower for unaligned, especially on cache-line splits where atomically modifying a single cache line isn't enough.

See also Atomicity on x86 where I explain what it means for an operation to be atomic.

Upvotes: 8

Ville Krumlinde
Ville Krumlinde

Reputation: 7131

If you read the wiki it says that CAS "is an atomic version of the following pseudocode" of the code you posted. Atomic means that the code will execute without interruptions from other threads. So even if several threads try to execute this code at the same time with the same arguments (like you suggest) only one of them will return true, because in practice they will not execute simultaneously since the atomicity require they run in isolation.

And since you mention "x86 doesn't guarantee atomicity of writes for non-aligned DWORDs", this is not an issue here either because the atomic property of the cas function.

Upvotes: 2

Related Questions