Reputation: 1234
I have a ConcurrentLinkedQueue
that I use as the underlying datastructure. On every put call, I add a unique incremented value to the list. I have both the synchronized and compare-and-swap versions of this method. When I have few threads (e.g., 5) and doing 10 million puts in all, I see that synchronized version works much better. When I have many threads (e.g., 2000) and do the same number of puts in total, I see that CAS works much better. Why does CAS underperform in comparison to blocking algorithm with fewer threads?
// AtomicReference<Foo> latestValue that is initialized
public void put(Double value) {
Foo currentValue;
while (true) {
currentValue = latestValue.get();
Foo newValue = new Foo(value);
if (latestValue.compareAndSet(currentValue, newValue)) {
historyList.add(newValue);
return;
}
}
}
Statistics
NON-BLOCKING
Threads 2000
Puts per thread 10000
Put time average 208493309
BLOCKING
Threads 2000
Puts per thread 10000
Put time average 2370823534
NON-BLOCKING
Threads 2
Puts per thread 10000000
Put time average 13117487385
BLOCKING
Threads 2
Puts per thread 10000000
Put time average 4201127857
Upvotes: 1
Views: 445
Reputation: 3207
TL;DR because in uncontended case JVM will optimize synchronized
and replace it with CAS lock.
In your CAS case you got overhead: you are trying to do some computation even if your CAS will fail. Of course it's nothing in comparison to real mutex acquiring, what usually happens when you are using synchronized
.
But JVM isn't stupid and when it can see that lock you are currently acquiring is uncontented, it just replaces real mutex with CAS lock (or even with simple store in case of biased locking).
So for two threads in case of synchronized
you are measuring just a CAS, but in case of your own CAS implementation you're also measuring time for allocating Foo instance, for compareAndSet and for get().
For 2000 threads JVM doesn't perform CAS-optimization, so your implementation outperforms mutex acquiring as expected.
Upvotes: 3