Reputation: 9481

Java CAS operation performs faster than C equivalent, why?

Here I have Java and C code that tries to do an Atomic increment operation using CAS.
To increment an long variable from 0 to 500,000,000.
C : Time taken : 7300ms
Java : Time Taken : 2083ms
Can any one double check these results? Because I just can't believe them.
Thanks

Java code:

import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicLong;

public class SmallerCASTest {

    public static void main(String[] args){
        final long MAX = 500l * 1000l * 1000l;
        final AtomicLong counter = new AtomicLong(0);

        long start = System.nanoTime();
        while (true) {
            if (counter.incrementAndGet() >= MAX) {
                break;
            }
        }

        long casTime = TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - start);
        System.out.println("Time Taken=" + casTime + "ms");
    }

}

C code:

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define NITER 500000000


int main (){
   long val = 0;     
   clock_t starttime = clock ();
    while (val < NITER){
      while (1){
        long current = val;
        long next = current+1;
        if ( __sync_bool_compare_and_swap (&val, current, next))
            break;
      }
     } 
   clock_t castime = (clock()-starttime)/ (CLOCKS_PER_SEC / 1000);
   printf ("Time taken : %d ",castime);
}

run.sh

#!/bin/bash

gcc -O3 test.c -o test.o
echo -e "\nC"
./test.o
javac SmallerCASTest.java
echo -e "\nJava"
java SmallerCASTest

Other details:

System : Linux XXXXXXXXX #1 SMP Thu Mar 22 08:00:08 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

gcc --version:
 gcc (GCC) 4.4.6 20110731 (Red Hat 4.4.6-3)

java -version: 
java version "1.6.0_31"
Java(TM) SE Runtime Environment (build 1.6.0_31-b04)
Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode)

Upvotes: 7

Answers (3)

janneb

Reputation: 37248

EDIT Indeed, java seems to implement incrementAndGet using a CAS operation, as you point out.

My testing seems to suggest that the C and Java versions have roughly equivalent performance (which makes sense, as the time consuming part is the atomic rather than any optimization of the rest that the java or C compilers manage to do).

So on my machine (Xeon X3450), the java version takes ~4700 ms, the C version ~4600 ms, a C version using __sync_add_and_fetch() ~3800 ms (suggesting that java could be improved here instead of implementing all the atomic operations on top of CAS).

java version is


java version "1.6.0_24"
OpenJDK Runtime Environment (IcedTea6 1.11.4) (6b24-1.11.4-1ubuntu0.10.04.1)
OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)

GCC is 4.4.3, x86_64.

OS is Ubuntu 10.04 x86_64.

So I can only conclude that something seems fishy in your tests.

Upvotes: 2

irreputable

Reputation: 45463

Because Java is awesome?

The java version takes 4ns for each loop. That is about right. An uncontended CAS is actually a CPU local operation, it should be very fast. (edit: probably not 4ns fast!)

Java achieves that speed by aggressive runtime optimization, the code is inlined and becomes just a couple of machine instructions, i.e. as fast as one can hand-code in assembly.

If the gcc version couldn't inline the function call, that's a big overhead per loop.

Upvotes: 0

OldCurmudgeon

Reputation: 65889

You are comparing apples with oranges as I am sure you expected. The java version is a true CAS with retry on failure while the C version is using what I'd call in java a synchronized form.

See this question for more details.

See this answer to that question for supporting narrative where it says A full memory barrier is created when this function is invoked, i.e. in java terms, this is a synchronized call.

Try using _compare_and_swap in the same way AtomicLong uses its java equivalent, i.e. spin on the function until the value changes to what you want it to be.

Added:

I cannot find a definitive C++ equivalent of a java AtomicLong but that does not mean there isn't one. Essentially, an AtomicLong can be changed by any thread at any time and just one of them succeeds. However, the change will be consistent, i.e. the change will be the result of the change by one or other of the threads, it will not be a combination of the two. If thread A attempts to change the value to 0xffff0000 (or the equivalent 64bit number) while thread B attempts a change to 0x0000ffff (ditto) the result will be either of the two values, more specifically it will not be 0x00000000 or 0xffffffff (unless of course a third thread gets involved).

Essentially, an AtomicLong has no synchronisation at all other than this.

Upvotes: 5

Java CAS operation performs faster than C equivalent, why?

Answers (3)

Related Questions