user592748
user592748

Reputation: 1234

Performance comparison between compare-and-swap and blocking algorithm

I have a ConcurrentLinkedQueue that I use as the underlying datastructure. On every put call, I add a unique incremented value to the list. I have both the synchronized and compare-and-swap versions of this method. When I have few threads (e.g., 5) and doing 10 million puts in all, I see that synchronized version works much better. When I have many threads (e.g., 2000) and do the same number of puts in total, I see that CAS works much better. Why does CAS underperform in comparison to blocking algorithm with fewer threads?

// AtomicReference<Foo> latestValue that is initialized
    public void put(Double value) {
        Foo currentValue;
        while (true) {
            currentValue = latestValue.get();
            Foo newValue = new Foo(value);
            if (latestValue.compareAndSet(currentValue, newValue)) {
                historyList.add(newValue);
                return;
            }
        }
    }

Statistics

NON-BLOCKING
Threads 2000
Puts per thread 10000
Put time average    208493309

BLOCKING
Threads 2000
Puts per thread 10000
Put time average    2370823534


NON-BLOCKING
Threads 2
Puts per thread 10000000
Put time average    13117487385

BLOCKING
Threads 2
Puts per thread 10000000
Put time average    4201127857

Upvotes: 1

Views: 445

Answers (1)

qwwdfsad
qwwdfsad

Reputation: 3207

TL;DR because in uncontended case JVM will optimize synchronized and replace it with CAS lock.

In your CAS case you got overhead: you are trying to do some computation even if your CAS will fail. Of course it's nothing in comparison to real mutex acquiring, what usually happens when you are using synchronized.

But JVM isn't stupid and when it can see that lock you are currently acquiring is uncontented, it just replaces real mutex with CAS lock (or even with simple store in case of biased locking). So for two threads in case of synchronized you are measuring just a CAS, but in case of your own CAS implementation you're also measuring time for allocating Foo instance, for compareAndSet and for get().

For 2000 threads JVM doesn't perform CAS-optimization, so your implementation outperforms mutex acquiring as expected.

Upvotes: 3

Related Questions