Reputation: 13420
The following example comes from the MSDN.
public class ThreadSafe
{
// Field totalValue contains a running total that can be updated
// by multiple threads. It must be protected from unsynchronized
// access.
private float totalValue = 0.0F;
// The Total property returns the running total.
public float Total { get { return totalValue; }}
// AddToTotal safely adds a value to the running total.
public float AddToTotal(float addend)
{
float initialValue, computedValue;
do
{
// Save the current running total in a local variable.
initialValue = totalValue;
// Add the new value to the running total.
computedValue = initialValue + addend;
// CompareExchange compares totalValue to initialValue. If
// they are not equal, then another thread has updated the
// running total since this loop started. CompareExchange
// does not update totalValue. CompareExchange returns the
// contents of totalValue, which do not equal initialValue,
// so the loop executes again.
}
while (initialValue != Interlocked.CompareExchange(ref totalValue,
computedValue, initialValue));
// If no other thread updated the running total, then
// totalValue and initialValue are equal when CompareExchange
// compares them, and computedValue is stored in totalValue.
// CompareExchange returns the value that was in totalValue
// before the update, which is equal to initialValue, so the
// loop ends.
// The function returns computedValue, not totalValue, because
// totalValue could be changed by another thread between
// the time the loop ends and the function returns.
return computedValue;
}
}
Should the totalValue not be declared as volatile to get the freshest value possible? I imagine that if you get a dirty value from a CPU cache then the call to Interlocked.CompareExchange should take care of getting the freshest value and cause the loop to try again. Would the volatile keyword potentially save one unnecessary loop?
I guess it isn't 100% necessary to have the volatile keyword because the method has overloads that takes datatype such as long that don't support the volatile keyword.
Upvotes: 3
Views: 882
Reputation: 43845
You could get some insights by studying the source code of the ImmutableInterlocked.Update
method:
/// <summary>
/// Mutates a value in-place with optimistic locking transaction semantics
/// via a specified transformation function.
/// The transformation is retried as many times as necessary to win
/// the optimistic locking race.
/// </summary>
public static bool Update<T>(ref T location, Func<T, T> transformer)
where T : class?
{
Requires.NotNull(transformer, nameof(transformer));
bool successful;
T oldValue = Volatile.Read(ref location);
do
{
T newValue = transformer(oldValue);
if (ReferenceEquals(oldValue, newValue))
{
// No change was actually required.
return false;
}
T interlockedResult = Interlocked.CompareExchange(ref location,
newValue, oldValue);
successful = ReferenceEquals(oldValue, interlockedResult);
oldValue = interlockedResult; // we already have a volatile read
// that we can reuse for the next loop
}
while (!successful);
return true;
}
You can see that the method starts by making a volatile read on the location
argument. I can think of two possible reasons for that:
Interlocked.CompareExchange
operation, in case the new value happens to be the same with the already stored value.transformer
delegate has an unknown computational complexity, so invoking it on a potentially stale value could be much more costly than the cost of the initial Volatile.Read
.Note: In retrospect the reason #2 above is most likely invalid, and I am just perpetuating the myths that programmers believe about CPU caches.
Upvotes: 3
Reputation: 36629
It does not matter since Interlocked.CompareExchange
inserts memory barriers.
initialValue = totalValue;
At this point the totalValue could be anything. Stale value from cache, just replaced, who knowns. While volatile
would prevent reading a cached value, the value might become stale just after it was read, so volatile does not solve anything.
Interlocked.CompareExchange(ref totalValue, computedValue, initialValue)
At this point we have memory barriers that ensures totalValue
is up to date. If it is equal to initialValue
, then we also know that the initialValue was not stale when we started the computation. If it is not equal we try again, and since we have issued a memory barrier we do not risk getting the same stale value the next iteration.
Edit: I find it very unlikely that there would be any performance difference. If there is no contention there is little reason for the value to be stale. If there is high contention the time will be dominated by needing to loop.
Upvotes: 1
Reputation: 365467
No, volatile
wouldn't be helpful at all, and certainly not for this reason. It would just give that first read "acquire" semantics instead of effectively relaxed, but either way will compile to similar asm that runs a load instruction.
if you get a dirty value from a CPU cache
CPU caches are coherent, so anything you read from CPU cache is the current globally agreed-on value for this line. "Dirty" just means it doesn't match DRAM contents, and will have to get written-back if / when evicted. A load value can also be forwarded from the store buffer, for a value this thread stored recently that isn't yet globally visible, but that's fine, Interlocked methods are full barriers that result in waiting for the store buffer to drain as well.
If you mean stale, then no, that's impossible, cache coherency protocols like MESI prevent that. This is why Interlocked things like CAS aren't horribly slow if the cache line is already owned by this core (MESI Modified or Exclusive state). See Myths Programmers Believe about CPU Caches which talks some about Java volatiles, which I think are similar to C# volatile.
This C++11 answer also explains some about cache coherency and asm. (Note that C++11 volatile
is significantly different from C#, and doesn't imply any thread-safety or ordering, but does still imply the asm must do a load or a store, not optimize into a register.)
On non-x86, running extra barrier instructions after the initial read (to give those acquire semantics) before you even try a CAS just slows things down. (On x86 including x86-64, a volatile read compiles to the same asm as a plain read, except it prevents compile-time reordering).
A volatile read can't be optimized into just using a value in a register if the current thread just wrote something via a non-interlocked =
assignment. That's not helpful either; if we just stored something and remember in a register what we stored, a load that does store-forwarding from the store buffer is morally equivalent to just using the register value.
Most of the good use-cases for lock-free atomics are when contention is lowish, so usually things can succeed without hardware having to wait a long time for access / ownership of the cache line. So you want to make the uncontended case as fast as possible. Avoid volatile
even if there was anything to gain from it in highly-contended cases, which I don't think there is anyway.
If you ever did any plain stores (assignments with =
, not interlocked RMW), volatile
would have an effect on those, too. That might mean waiting for the store buffer to drain before later memory ops in this thread can run, if C# volatile
gives semantics like C++ memory_order_seq_cst
. In that case, you'd be slowing down the code involving the stores a lot, if you didn't need ordering wrt. other loads/stores. If you did such a store before this CAS code, yeah you'd be waiting until the store (and all previous stores) were globally visible to try reloading it. This would mean a reload + CAS the CPU is waiting to do right after are very likely to not have to spin because the CPU will have ownership of that line, but I think you'd effectively get similar behaviour from the full barrier that's part of an Interlocked CAS.
Upvotes: 5