Stephan Dollberg
Stephan Dollberg

Reputation: 34608

Why does std::atomic_compare_exchange update the expected value?

Why does std::atomic_compare_exchange and all its brothers and sisters update the passed expected value?

I am wondering if the are any reasons besides the given simplicity in loops, e.g.: is there an intrinsic function which can do that in one operation to improve performance?

Upvotes: 4

Views: 1464

Answers (3)

BeeOnRope
BeeOnRope

Reputation: 65046

I am wondering if the are any reasons besides the given simplicity in loops, e.g.: is there an intrinsic function which can do that in one operation to improve performance?

Yes. For example on x86 this compare-and-swap (CAS) operation will be implemented by cmpxchg and this operation updates the expected value (passed in rax) when the CAS fails. Arm CAS looks to be the same (though I couldn't actually tell from the documentation, assembly examination was required).

The outcome can be seen in goldbolt, where returning the updated expected value after a CAS requires no second read: we may just return the register value which has been already been loaded with the actual read value.

That C++ defines the operation in this way pretty much means the underlying hardware CAS (if CAS is used) must behave this way for efficient operation. If you had a CAS which returned a failure indication but not the value, you can't simply emulate the atomic behavior by a subsequent read: it wouldn't be atomic and could even return the "desired" value which is an impossible result. So I think you'd need a CAS loop to implement CAS!

See an answer and discussion in comments on a Java 8 Q&A about implementing compareAndExchange on top of the older compareAndSet that only returns a bool status, not the updated value. For some use-cases (but maybe not all), a stale old value on failure could be indistinguishable from spurious failure for compare_exchange_weak, but maybe not generally equivalent enough for a compiler to skip looping even for compare_exchange_weak on a hypothetical ISA that didn't leave the load result in a register. And definitely not for compare_exchange_strong.

Also related: GNU C legacy __sync builtins had both __sync_bool_compare_and_swap and __sync_val_compare_and_swap. So you could get one or the other of the old value or the bool status as a return value, with args taken by value. Using __sync_val_compare_and_swap, you check for success with retval == expected (since it's not a "weak" CAS; neither is the bool version). You can't get the value from just the bool, but you can get the bool from the value (and hopefully compilers would optimize that into using the FLAGS output of lock cmpxchg on x86 instead of an extra cmp on the potentially-updated integer register).

Upvotes: 2

Jonathan Wakely
Jonathan Wakely

Reputation: 171433

The processor has to load the current value, in order to do the "compare" part of the operation. When the comparison fails the caller needs to know the new value, to retry the compare-exchange (you almost always use it in a loop), so if it wasn't returned (e.g. by modifying the expected value that is passed by reference) then the caller would need to do another atomic load to get the new value. That's wasteful, because the processor has already loaded the value. You should only be messing about with low-level atomic operations when extreme performance is the only option, so in that case you do not want to perform two operations when one will do.

Upvotes: 10

jthill
jthill

Reputation: 60555

is there an intrinsic function which can do that in one operation to improve performance

That can do what, specifically? The instruction has to load the current value to do the comparison, so on a mismatch yielding the current value costs nothing and is pretty much guaranteed to be useful.

Upvotes: 2

Related Questions