Reputation: 9471
Java code:
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicLong;
public class SmallerCASTest {
public static void main(String[] args){
final long MAX = 500l * 1000l * 1000l;
final AtomicLong counter = new AtomicLong(0);
long start = System.nanoTime();
while (true) {
if (counter.incrementAndGet() >= MAX) {
break;
}
}
long casTime = TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - start);
System.out.println("Time Taken=" + casTime + "ms");
}
}
C code:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define NITER 500000000
int main (){
long val = 0;
clock_t starttime = clock ();
while (val < NITER){
while (1){
long current = val;
long next = current+1;
if ( __sync_bool_compare_and_swap (&val, current, next))
break;
}
}
clock_t castime = (clock()-starttime)/ (CLOCKS_PER_SEC / 1000);
printf ("Time taken : %d ",castime);
}
run.sh
#!/bin/bash
gcc -O3 test.c -o test.o
echo -e "\nC"
./test.o
javac SmallerCASTest.java
echo -e "\nJava"
java SmallerCASTest
Other details:
System : Linux XXXXXXXXX #1 SMP Thu Mar 22 08:00:08 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
gcc --version:
gcc (GCC) 4.4.6 20110731 (Red Hat 4.4.6-3)
java -version:
java version "1.6.0_31"
Java(TM) SE Runtime Environment (build 1.6.0_31-b04)
Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode)
Upvotes: 7
Views: 1438
Reputation: 37188
EDIT Indeed, java seems to implement incrementAndGet using a CAS operation, as you point out.
My testing seems to suggest that the C and Java versions have roughly equivalent performance (which makes sense, as the time consuming part is the atomic rather than any optimization of the rest that the java or C compilers manage to do).
So on my machine (Xeon X3450), the java version takes ~4700 ms, the C version ~4600 ms, a C version using __sync_add_and_fetch() ~3800 ms (suggesting that java could be improved here instead of implementing all the atomic operations on top of CAS).
java version is
java version "1.6.0_24"
OpenJDK Runtime Environment (IcedTea6 1.11.4) (6b24-1.11.4-1ubuntu0.10.04.1)
OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)
GCC is 4.4.3, x86_64.
OS is Ubuntu 10.04 x86_64.
So I can only conclude that something seems fishy in your tests.
Upvotes: 2
Reputation: 45433
Because Java is awesome?
The java version takes 4ns for each loop. That is about right. An uncontended CAS is actually a CPU local operation, it should be very fast. (edit: probably not 4ns fast!)
Java achieves that speed by aggressive runtime optimization, the code is inlined and becomes just a couple of machine instructions, i.e. as fast as one can hand-code in assembly.
If the gcc version couldn't inline the function call, that's a big overhead per loop.
Upvotes: 0
Reputation: 65813
You are comparing apples with oranges as I am sure you expected. The java
version is a true CAS with retry on failure while the C
version is using what I'd call in java
a synchronized
form.
See this question for more details.
See this answer to that question for supporting narrative where it says A full memory barrier is created when this function is invoked
, i.e. in java terms, this is a synchronized
call.
Try using _compare_and_swap in the same way AtomicLong
uses its java equivalent, i.e. spin on the function until the value changes to what you want it to be.
Added:
I cannot find a definitive C++ equivalent of a java AtomicLong
but that does not mean there isn't one. Essentially, an AtomicLong
can be changed by any thread at any time and just one of them succeeds. However, the change will be consistent, i.e. the change will be the result of the change by one or other of the threads, it will not be a combination of the two. If thread A attempts to change the value to 0xffff0000 (or the equivalent 64bit number) while thread B attempts a change to 0x0000ffff (ditto) the result will be either of the two values, more specifically it will not be 0x00000000 or 0xffffffff (unless of course a third thread gets involved).
Essentially, an AtomicLong
has no synchronisation at all other than this.
Upvotes: 5