javamultithreadingperformanceconcurrencysynchronization

Reputation: 6061

Performance of synchronize section in Java

I had a small dispute over performance of synchronized block in Java. This is a theoretical question, which does not affect real life application.

Consider single-thread application, which uses locks and synchronize sections. Does this code work slower than the same code without synchronize sections? If so, why? We do not discuss concurrency, since it’s only single thread application

Update

Found interesting benchmark testing it. But it's from 2001. Things could have changed dramatically in the latest version of JDK

Upvotes: 46

Answers (6)

Hailton

Reputation: 1192

This sample code (with 100 threads making 1,000,000 iterations each one) demonstrates the performance difference between avoiding and not avoiding a synchronized block.

Output:

Total time(Avoid Sync Block): 630ms
Total time(NOT Avoid Sync Block): 6360ms
Total time(Avoid Sync Block): 427ms
Total time(NOT Avoid Sync Block): 6636ms
Total time(Avoid Sync Block): 481ms
Total time(NOT Avoid Sync Block): 5882ms

Code:

import org.apache.commons.lang.time.StopWatch;

public class App {
    public static int countTheads = 100;
    public static int loopsPerThead = 1000000;
    public static int sleepOfFirst = 10;

    public static int runningCount = 0;
    public static Boolean flagSync = null;

    public static void main( String[] args )
    {        
        for (int j = 0; j < 3; j++) {     
            App.startAll(new App.AvoidSyncBlockRunner(), "(Avoid Sync Block)");
            App.startAll(new App.NotAvoidSyncBlockRunner(), "(NOT Avoid Sync Block)");
        }
    }

    public static void startAll(Runnable runnable, String description) {
        App.runningCount = 0;
        App.flagSync = null;
        Thread[] threads = new Thread[App.countTheads];

        StopWatch sw = new StopWatch();
        sw.start();
        for (int i = 0; i < threads.length; i++) {
            threads[i] = new Thread(runnable);
        }
        for (int i = 0; i < threads.length; i++) {
            threads[i].start();
        }
        do {
            try {
                Thread.sleep(10);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        } while (runningCount != 0);
        System.out.println("Total time"+description+": " + (sw.getTime() - App.sleepOfFirst) + "ms");
    }

    public static void commonBlock() {
        String a = "foo";
        a += "Baa";
    }

    public static synchronized void incrementCountRunning(int inc) {
        runningCount = runningCount + inc;
    }

    public static class NotAvoidSyncBlockRunner implements Runnable {

        public void run() {
            App.incrementCountRunning(1);
            for (int i = 0; i < App.loopsPerThead; i++) {
                synchronized (App.class) {
                    if (App.flagSync == null) {
                        try {
                            Thread.sleep(App.sleepOfFirst);
                        } catch (InterruptedException e) {
                            e.printStackTrace();
                        }
                        App.flagSync = true;
                    }
                }
                App.commonBlock();
            }
            App.incrementCountRunning(-1);
        }
    }

    public static class AvoidSyncBlockRunner implements Runnable {

        public void run() {
            App.incrementCountRunning(1);
            for (int i = 0; i < App.loopsPerThead; i++) {
                // THIS "IF" MAY SEEM POINTLESS, BUT IT AVOIDS THE NEXT 
                //ITERATION OF ENTERING INTO THE SYNCHRONIZED BLOCK
                if (App.flagSync == null) {
                    synchronized (App.class) {
                        if (App.flagSync == null) {
                            try {
                                Thread.sleep(App.sleepOfFirst);
                            } catch (InterruptedException e) {
                                e.printStackTrace();
                            }
                            App.flagSync = true;
                        }
                    }
                }
                App.commonBlock();
            }
            App.incrementCountRunning(-1);
        }
    }
}

Upvotes: 0

Anton

Reputation: 6061

There are 3 type of locking in HotSpot

Fat: JVM relies on OS mutexes to acquire lock.
Thin: JVM is using CAS algorithm.
Biased: CAS is rather expensive operation on some of the architecture. Biased locking - is special type of locking optimized for scenario when only one thread is working on object.

By default JVM uses thin locking. Later if JVM determines that there is no contention thin locking is converted to biased locking. Operation that changes type of the lock is rather expensive, hence JVM does not apply this optimization immediately. There is special JVM option - XX:BiasedLockingStartupDelay=delay which tells JVM when this kind of optimization should be applied.

Once biased, that thread can subsequently lock and unlock the object without resorting to expensive atomic instructions.

Answer to the question: it depends. But if biased, the single threaded code with locking and without locking has average same performance.

Biased Locking in HotSpot - Dave Dice's Weblog
Synchronization and Object Locking - Thomas Kotzmann and Christian Wimmer

Upvotes: 48

NPE

Reputation: 500873

There is some overhead in acquiring a non-contested lock, but on modern JVMs it is very small.

A key run-time optimization that's relevant to this case is called "Biased Locking" and is explained in the Java SE 6 Performance White Paper.

If you wanted to have some performance numbers that are relevant to your JVM and hardware, you could construct a micro-benchmark to try and measure this overhead.

Upvotes: 20

Edward Thomson

Reputation: 78793

Single-threaded code will still run slower when using synchronized blocks. Obviously you will not have other threads stalled while waiting for other threads to finish, however you will have to deal with the other effects of synchronization, namely cache coherency.

Synchronized blocks are not only used for concurrency, but also visibility. Every synchronized block is a memory barrier: the JVM is free to work on variables in registers, instead of main memory, on the assumption that multiple threads will not access that variable. Without synchronization blocks, this data could be stored in a CPU's cache and different threads on different CPUs would not see the same data. By using a synchronization block, you force the JVM to write this data to main memory for visibility to other threads.

So even though you're free from lock contention, the JVM will still have to do housekeeping in flushing data to main memory.

In addition, this has optimization constraints. The JVM is free to reorder instructions in order to provide optimization: consider a simple example:

foo++;
bar++;

versus:

foo++;
synchronized(obj)
{
    bar++;
}

In the first example, the compiler is free to load foo and bar at the same time, then increment them both, then save them both. In the second example, the compiler must perform the load/add/save on foo, then perform the load/add/save on bar. Thus, synchronization may impact the ability of the JRE to optimize instructions.

(An excellent book on the Java Memory Model is Brian Goetz's Java Concurrency In Practice.)

Upvotes: 62

Peter Lawrey

Reputation: 533790

Using locks when you don't need to will slow down your application. It could be too small to measure or it could be surprisingly high.

IMHO Often the best approach is to use lock free code in a single threaded program to make it clear this code is not intended to be shared across thread. This could be more important for maintenance than any performance issues.

public static void main(String... args) throws IOException {
    for (int i = 0; i < 3; i++) {
        perfTest(new Vector<Integer>());
        perfTest(new ArrayList<Integer>());
    }
}

private static void perfTest(List<Integer> objects) {
    long start = System.nanoTime();
    final int runs = 100000000;
    for (int i = 0; i < runs; i += 20) {
        // add items.
        for (int j = 0; j < 20; j+=2)
            objects.add(i);
        // remove from the end.
        while (!objects.isEmpty())
            objects.remove(objects.size() - 1);
    }
    long time = System.nanoTime() - start;
    System.out.printf("%s each add/remove took an average of %.1f ns%n", objects.getClass().getSimpleName(),  (double) time/runs);
}

prints

Vector each add/remove took an average of 38.9 ns
ArrayList each add/remove took an average of 6.4 ns
Vector each add/remove took an average of 10.5 ns
ArrayList each add/remove took an average of 6.2 ns
Vector each add/remove took an average of 10.4 ns
ArrayList each add/remove took an average of 5.7 ns

From a performance point of view, if 4 ns is important to you, you have to use the non-synchronized version.

For 99% of use cases, the clarity of the code is more important than performance. Clear, simple code often performs reasonably good as well.

BTW: I am using a 4.6 GHz i7 2600 with Oracle Java 7u1.

For comparison if I do the following where perfTest1,2,3 are identical.

    perfTest1(new ArrayList<Integer>());
    perfTest2(new Vector<Integer>());
    perfTest3(Collections.synchronizedList(new ArrayList<Integer>()));

I get

ArrayList each add/remove took an average of 2.6 ns
Vector each add/remove took an average of 7.5 ns
SynchronizedRandomAccessList each add/remove took an average of 8.9 ns

If I use a common perfTest method it cannot inline the code as optimally and they are all slower

ArrayList each add/remove took an average of 9.3 ns
Vector each add/remove took an average of 12.4 ns
SynchronizedRandomAccessList each add/remove took an average of 13.9 ns

Swapping the order of tests

ArrayList each add/remove took an average of 3.0 ns
Vector each add/remove took an average of 39.7 ns
ArrayList each add/remove took an average of 2.0 ns
Vector each add/remove took an average of 4.6 ns
ArrayList each add/remove took an average of 2.3 ns
Vector each add/remove took an average of 4.5 ns
ArrayList each add/remove took an average of 2.3 ns
Vector each add/remove took an average of 4.4 ns
ArrayList each add/remove took an average of 2.4 ns
Vector each add/remove took an average of 4.6 ns

one at a time

ArrayList each add/remove took an average of 3.0 ns
ArrayList each add/remove took an average of 3.0 ns
ArrayList each add/remove took an average of 2.3 ns
ArrayList each add/remove took an average of 2.2 ns
ArrayList each add/remove took an average of 2.4 ns

and

Vector each add/remove took an average of 28.4 ns
Vector each add/remove took an average of 37.4 ns
Vector each add/remove took an average of 7.6 ns
Vector each add/remove took an average of 7.6 ns
Vector each add/remove took an average of 7.6 ns

Upvotes: 10

sworisbreathing

Reputation: 730

Assuming you're using the HotSpot VM, I believe the JVM is able to recognize that there is no contention for any resources within the synchronized block and treat it as "normal" code.

Upvotes: 0

Performance of synchronize section in Java

Answers (6)

Related Questions