Reputation: 6041
I had a small dispute over performance of synchronized
block in Java. This is a theoretical question, which does not affect real life application.
Consider single-thread application, which uses locks and synchronize sections. Does this code work slower than the same code without synchronize sections? If so, why? We do not discuss concurrency, since it’s only single thread application
Update
Found interesting benchmark testing it. But it's from 2001. Things could have changed dramatically in the latest version of JDK
Upvotes: 46
Views: 38755
Reputation: 1192
This sample code (with 100 threads making 1,000,000 iterations each one) demonstrates the performance difference between avoiding and not avoiding a synchronized block.
Output:
Total time(Avoid Sync Block): 630ms
Total time(NOT Avoid Sync Block): 6360ms
Total time(Avoid Sync Block): 427ms
Total time(NOT Avoid Sync Block): 6636ms
Total time(Avoid Sync Block): 481ms
Total time(NOT Avoid Sync Block): 5882ms
Code:
import org.apache.commons.lang.time.StopWatch;
public class App {
public static int countTheads = 100;
public static int loopsPerThead = 1000000;
public static int sleepOfFirst = 10;
public static int runningCount = 0;
public static Boolean flagSync = null;
public static void main( String[] args )
{
for (int j = 0; j < 3; j++) {
App.startAll(new App.AvoidSyncBlockRunner(), "(Avoid Sync Block)");
App.startAll(new App.NotAvoidSyncBlockRunner(), "(NOT Avoid Sync Block)");
}
}
public static void startAll(Runnable runnable, String description) {
App.runningCount = 0;
App.flagSync = null;
Thread[] threads = new Thread[App.countTheads];
StopWatch sw = new StopWatch();
sw.start();
for (int i = 0; i < threads.length; i++) {
threads[i] = new Thread(runnable);
}
for (int i = 0; i < threads.length; i++) {
threads[i].start();
}
do {
try {
Thread.sleep(10);
} catch (InterruptedException e) {
e.printStackTrace();
}
} while (runningCount != 0);
System.out.println("Total time"+description+": " + (sw.getTime() - App.sleepOfFirst) + "ms");
}
public static void commonBlock() {
String a = "foo";
a += "Baa";
}
public static synchronized void incrementCountRunning(int inc) {
runningCount = runningCount + inc;
}
public static class NotAvoidSyncBlockRunner implements Runnable {
public void run() {
App.incrementCountRunning(1);
for (int i = 0; i < App.loopsPerThead; i++) {
synchronized (App.class) {
if (App.flagSync == null) {
try {
Thread.sleep(App.sleepOfFirst);
} catch (InterruptedException e) {
e.printStackTrace();
}
App.flagSync = true;
}
}
App.commonBlock();
}
App.incrementCountRunning(-1);
}
}
public static class AvoidSyncBlockRunner implements Runnable {
public void run() {
App.incrementCountRunning(1);
for (int i = 0; i < App.loopsPerThead; i++) {
// THIS "IF" MAY SEEM POINTLESS, BUT IT AVOIDS THE NEXT
//ITERATION OF ENTERING INTO THE SYNCHRONIZED BLOCK
if (App.flagSync == null) {
synchronized (App.class) {
if (App.flagSync == null) {
try {
Thread.sleep(App.sleepOfFirst);
} catch (InterruptedException e) {
e.printStackTrace();
}
App.flagSync = true;
}
}
}
App.commonBlock();
}
App.incrementCountRunning(-1);
}
}
}
Upvotes: 0
Reputation: 6041
There are 3 type of locking in HotSpot
By default JVM uses thin locking. Later if JVM determines that there is no contention thin locking is converted to biased locking. Operation that changes type of the lock is rather expensive, hence JVM does not apply this optimization immediately. There is special JVM option - XX:BiasedLockingStartupDelay=delay which tells JVM when this kind of optimization should be applied.
Once biased, that thread can subsequently lock and unlock the object without resorting to expensive atomic instructions.
Answer to the question: it depends. But if biased, the single threaded code with locking and without locking has average same performance.
Upvotes: 48
Reputation: 500167
There is some overhead in acquiring a non-contested lock, but on modern JVMs it is very small.
A key run-time optimization that's relevant to this case is called "Biased Locking" and is explained in the Java SE 6 Performance White Paper.
If you wanted to have some performance numbers that are relevant to your JVM and hardware, you could construct a micro-benchmark to try and measure this overhead.
Upvotes: 20
Reputation: 78623
Single-threaded code will still run slower when using synchronized
blocks. Obviously you will not have other threads stalled while waiting for other threads to finish, however you will have to deal with the other effects of synchronization, namely cache coherency.
Synchronized blocks are not only used for concurrency, but also visibility. Every synchronized block is a memory barrier: the JVM is free to work on variables in registers, instead of main memory, on the assumption that multiple threads will not access that variable. Without synchronization blocks, this data could be stored in a CPU's cache and different threads on different CPUs would not see the same data. By using a synchronization block, you force the JVM to write this data to main memory for visibility to other threads.
So even though you're free from lock contention, the JVM will still have to do housekeeping in flushing data to main memory.
In addition, this has optimization constraints. The JVM is free to reorder instructions in order to provide optimization: consider a simple example:
foo++;
bar++;
versus:
foo++;
synchronized(obj)
{
bar++;
}
In the first example, the compiler is free to load foo
and bar
at the same time, then increment them both, then save them both. In the second example, the compiler must perform the load/add/save on foo
, then perform the load/add/save on bar
. Thus, synchronization may impact the ability of the JRE to optimize instructions.
(An excellent book on the Java Memory Model is Brian Goetz's Java Concurrency In Practice.)
Upvotes: 62
Reputation: 533442
Using locks when you don't need to will slow down your application. It could be too small to measure or it could be surprisingly high.
IMHO Often the best approach is to use lock free code in a single threaded program to make it clear this code is not intended to be shared across thread. This could be more important for maintenance than any performance issues.
public static void main(String... args) throws IOException {
for (int i = 0; i < 3; i++) {
perfTest(new Vector<Integer>());
perfTest(new ArrayList<Integer>());
}
}
private static void perfTest(List<Integer> objects) {
long start = System.nanoTime();
final int runs = 100000000;
for (int i = 0; i < runs; i += 20) {
// add items.
for (int j = 0; j < 20; j+=2)
objects.add(i);
// remove from the end.
while (!objects.isEmpty())
objects.remove(objects.size() - 1);
}
long time = System.nanoTime() - start;
System.out.printf("%s each add/remove took an average of %.1f ns%n", objects.getClass().getSimpleName(), (double) time/runs);
}
prints
Vector each add/remove took an average of 38.9 ns
ArrayList each add/remove took an average of 6.4 ns
Vector each add/remove took an average of 10.5 ns
ArrayList each add/remove took an average of 6.2 ns
Vector each add/remove took an average of 10.4 ns
ArrayList each add/remove took an average of 5.7 ns
From a performance point of view, if 4 ns is important to you, you have to use the non-synchronized version.
For 99% of use cases, the clarity of the code is more important than performance. Clear, simple code often performs reasonably good as well.
BTW: I am using a 4.6 GHz i7 2600 with Oracle Java 7u1.
For comparison if I do the following where perfTest1,2,3 are identical.
perfTest1(new ArrayList<Integer>());
perfTest2(new Vector<Integer>());
perfTest3(Collections.synchronizedList(new ArrayList<Integer>()));
I get
ArrayList each add/remove took an average of 2.6 ns
Vector each add/remove took an average of 7.5 ns
SynchronizedRandomAccessList each add/remove took an average of 8.9 ns
If I use a common perfTest
method it cannot inline the code as optimally and they are all slower
ArrayList each add/remove took an average of 9.3 ns
Vector each add/remove took an average of 12.4 ns
SynchronizedRandomAccessList each add/remove took an average of 13.9 ns
Swapping the order of tests
ArrayList each add/remove took an average of 3.0 ns
Vector each add/remove took an average of 39.7 ns
ArrayList each add/remove took an average of 2.0 ns
Vector each add/remove took an average of 4.6 ns
ArrayList each add/remove took an average of 2.3 ns
Vector each add/remove took an average of 4.5 ns
ArrayList each add/remove took an average of 2.3 ns
Vector each add/remove took an average of 4.4 ns
ArrayList each add/remove took an average of 2.4 ns
Vector each add/remove took an average of 4.6 ns
one at a time
ArrayList each add/remove took an average of 3.0 ns
ArrayList each add/remove took an average of 3.0 ns
ArrayList each add/remove took an average of 2.3 ns
ArrayList each add/remove took an average of 2.2 ns
ArrayList each add/remove took an average of 2.4 ns
and
Vector each add/remove took an average of 28.4 ns
Vector each add/remove took an average of 37.4 ns
Vector each add/remove took an average of 7.6 ns
Vector each add/remove took an average of 7.6 ns
Vector each add/remove took an average of 7.6 ns
Upvotes: 10
Reputation: 730
Assuming you're using the HotSpot VM, I believe the JVM is able to recognize that there is no contention for any resources within the synchronized
block and treat it as "normal" code.
Upvotes: 0