Reputation: 27455
I'm trying to understand the difference between synchronized block and synchronized methods by examples. Consider the following simple class:
public class Main {
private static final Object lock = new Object();
private static long l;
public static void main(String[] args) {
}
public static void action(){
synchronized(lock){
l = (l + 1) * 2;
System.out.println(l);
}
}
}
The compiled Main::action()
will look as follows:
public static void action();
Code:
0: getstatic #2 // Field lock:Ljava/lang/Object;
3: dup
4: astore_0
5: monitorenter // <---- ENTERING
6: getstatic #3 // Field l:J
9: lconst_1
10: ladd
11: ldc2_w #4 // long 2l
14: lmul
15: putstatic #3 // Field l:J
18: getstatic #6 // Field java/lang/System.out:Ljava/io/PrintStream;
21: getstatic #3 // Field l:J
24: invokevirtual #7 // Method java/io/PrintStream.println:(J)V
27: aload_0
28: monitorexit // <---- EXITING
29: goto 37
32: astore_1
33: aload_0
34: monitorexit // <---- EXITING TWICE????
35: aload_1
36: athrow
37: return
I thought we'd better use synchronized blocks instead of synchronized methods because it provides more encapsulation preventing clients to affect on synchronization policy (with synchronized method any client can acquire the lock affecting the synchronization policy). But from the performance standpoint it seemed to me pretty much the same. Now consider the synchronized-method version:
public static synchronized void action(){
l = (l + 1) * 2;
System.out.println(l);
}
public static synchronized void action();
Code:
0: getstatic #2 // Field l:J
3: lconst_1
4: ladd
5: ldc2_w #3 // long 2l
8: lmul
9: putstatic #2 // Field l:J
12: getstatic #5 // Field java/lang/System.out:Ljava/io/PrintStream;
15: getstatic #2 // Field l:J
18: invokevirtual #6 // Method java/io/PrintStream.println:(J)V
21: return
So, in the synchronized method version there're much less intructions to execute, so I would say that it's faster.
QUESTION: Is a synchronized method faster than a synchronized block?
Upvotes: 4
Views: 1177
Reputation: 6787
A quick test using the Java code posted at the bottom of this answer resulted in the synchronized method
being faster. Running the code on a Windows JVM on an i7 resulted in the following averages
synchronized block: 0.004254 s
synchronized method: 0.001056 s
This would imply that the synchronized method
is actually faster as per your byte-code assessment.
What confused me, however, was the stark difference in the 2 times. I would have presumed that the JVM would still have a lock on the underlying synchronized method and the difference in times would be negligible, this was not the end result however. Since the Oracle JVM is closed, I took a look at the OpenJDK hotspot JVM source and dug into the byte code interpreter that handles the synchronization methods/blocks. To reiterate, the following JVM code is for the OpenJDK, but I would presume the official JVM has something similar in nature to how it handles the situation.
When a .class
file is built, if a method is synchronized, byte code is put in that alerts the JVM that the method is synchronized (similar to byte code being added if the method is static/public/final/varargs
, etc.), and the underlying JVM code sets a flag on the method structure to this effect.
When the byte-code interpreter hits byte-code for method invocation, the following code is called before the method is invoked that checks if it needs to be locked:
case method_entry: {
/* CODE_EDIT: irrelevant code removed for brevities sake */
// lock method if synchronized
if (METHOD->is_synchronized()) {
// oop rcvr = locals[0].j.r;
oop rcvr;
if (METHOD->is_static()) {
rcvr = METHOD->constants()->pool_holder()->java_mirror();
} else {
rcvr = LOCALS_OBJECT(0);
VERIFY_OOP(rcvr);
}
// The initial monitor is ours for the taking
BasicObjectLock* mon = &istate->monitor_base()[-1];
oop monobj = mon->obj();
assert(mon->obj() == rcvr, "method monitor mis-initialized");
bool success = UseBiasedLocking;
if (UseBiasedLocking) {
/* CODE_EDIT: this code is only run if you have biased locking enabled as a JVM option */
}
if (!success) {
markOop displaced = rcvr->mark()->set_unlocked();
mon->lock()->set_displaced_header(displaced);
if (Atomic::cmpxchg_ptr(mon, rcvr->mark_addr(), displaced) != displaced) {
// Is it simple recursive case?
if (THREAD->is_lock_owned((address) displaced->clear_lock_bits())) {
mon->lock()->set_displaced_header(NULL);
} else {
CALL_VM(InterpreterRuntime::monitorenter(THREAD, mon), handle_exception);
}
}
}
}
/* CODE_EDIT: irrelevant code removed for brevities sake */
goto run;
}
Then, when the method completes and returns to the JVM function handler, the following code is called to unlock the method (note that the boolean method_unlock_needed
is set before the the method is invoked to bool method_unlock_needed = METHOD->is_synchronized()
):
if (method_unlock_needed) {
if (base->obj() == NULL) {
/* CODE_EDIT: irrelevant code removed for brevities sake */
} else {
oop rcvr = base->obj();
if (rcvr == NULL) {
if (!suppress_error) {
VM_JAVA_ERROR_NO_JUMP(vmSymbols::java_lang_NullPointerException(), "");
illegal_state_oop = THREAD->pending_exception();
THREAD->clear_pending_exception();
}
} else {
BasicLock* lock = base->lock();
markOop header = lock->displaced_header();
base->set_obj(NULL);
// If it isn't recursive we either must swap old header or call the runtime
if (header != NULL) {
if (Atomic::cmpxchg_ptr(header, rcvr->mark_addr(), lock) != lock) {
// restore object for the slow case
base->set_obj(rcvr);
{
// Prevent any HandleMarkCleaner from freeing our live handles
HandleMark __hm(THREAD);
CALL_VM_NOCHECK(InterpreterRuntime::monitorexit(THREAD, base));
}
if (THREAD->has_pending_exception()) {
if (!suppress_error) illegal_state_oop = THREAD->pending_exception();
THREAD->clear_pending_exception();
}
}
}
}
}
}
The statements CALL_VM(InterpreterRuntime::monitorenter(THREAD, mon), handle_exception);
and CALL_VM_NOCHECK(InterpreterRuntime::monitorexit(THREAD, base));
, and more specifically the functions InterpreterRuntime::monitorenter
and InterpreterRuntime::monitorexit
are the code that is called in the JVM for both synchronized methods and blocks to lock/unlock the underlying objects. The run
label in the code is the massive byte-code interpreter switch
statement that handles the different byte codes being parsed.
From here, if a synchronized block opcode (the monitorenter
and monitorexit
byte-codes) is encountered, the following case
statements are run (for monitorenter
and monitorexit
respectively):
CASE(_monitorenter): {
oop lockee = STACK_OBJECT(-1);
// derefing's lockee ought to provoke implicit null check
CHECK_NULL(lockee);
// find a free monitor or one already allocated for this object
// if we find a matching object then we need a new monitor
// since this is recursive enter
BasicObjectLock* limit = istate->monitor_base();
BasicObjectLock* most_recent = (BasicObjectLock*) istate->stack_base();
BasicObjectLock* entry = NULL;
while (most_recent != limit ) {
if (most_recent->obj() == NULL) entry = most_recent;
else if (most_recent->obj() == lockee) break;
most_recent++;
}
if (entry != NULL) {
entry->set_obj(lockee);
markOop displaced = lockee->mark()->set_unlocked();
entry->lock()->set_displaced_header(displaced);
if (Atomic::cmpxchg_ptr(entry, lockee->mark_addr(), displaced) != displaced) {
// Is it simple recursive case?
if (THREAD->is_lock_owned((address) displaced->clear_lock_bits())) {
entry->lock()->set_displaced_header(NULL);
} else {
CALL_VM(InterpreterRuntime::monitorenter(THREAD, entry), handle_exception);
}
}
UPDATE_PC_AND_TOS_AND_CONTINUE(1, -1);
} else {
istate->set_msg(more_monitors);
UPDATE_PC_AND_RETURN(0); // Re-execute
}
}
CASE(_monitorexit): {
oop lockee = STACK_OBJECT(-1);
CHECK_NULL(lockee);
// derefing's lockee ought to provoke implicit null check
// find our monitor slot
BasicObjectLock* limit = istate->monitor_base();
BasicObjectLock* most_recent = (BasicObjectLock*) istate->stack_base();
while (most_recent != limit ) {
if ((most_recent)->obj() == lockee) {
BasicLock* lock = most_recent->lock();
markOop header = lock->displaced_header();
most_recent->set_obj(NULL);
// If it isn't recursive we either must swap old header or call the runtime
if (header != NULL) {
if (Atomic::cmpxchg_ptr(header, lockee->mark_addr(), lock) != lock) {
// restore object for the slow case
most_recent->set_obj(lockee);
CALL_VM(InterpreterRuntime::monitorexit(THREAD, most_recent), handle_exception);
}
}
UPDATE_PC_AND_TOS_AND_CONTINUE(1, -1);
}
most_recent++;
}
// Need to throw illegal monitor state exception
CALL_VM(InterpreterRuntime::throw_illegal_monitor_state_exception(THREAD), handle_exception);
ShouldNotReachHere();
}
Again, the same InterpreterRuntime::monitorenter
and InterpreterRuntime::monitorexit
functions are called to lock the underlying objects, but with much more overhead in the process, which explains why there is the difference in times from a synchronized method and synchronized block.
Obviously both the synchronized method and synchronized block have their pros/cons to consider when using, but the question was asking about which is faster, and based on the preliminary test and the source from the OpenJDK, it would appear as though the synchronized method (alone) is indeed faster than a synchronized block (alone). Your results may vary though (especially the more complex the code is), so if performance is an issue it's best to do your own tests and gauge from there what might make sense for your case.
And here's the relevant Java test code:
public class Main
{
public static final Object lock = new Object();
private static long l = 0;
public static void SyncLock()
{
synchronized (lock) {
++l;
}
}
public static synchronized void SyncFunction()
{
++l;
}
public static class ThreadSyncLock implements Runnable
{
@Override
public void run()
{
for (int i = 0; i < 10000; ++i) {
SyncLock();
}
}
}
public static class ThreadSyncFn implements Runnable
{
@Override
public void run()
{
for (int i = 0; i < 10000; ++i) {
SyncFunction();
}
}
}
public static void main(String[] args)
{
l = 0;
try {
java.util.ArrayList<Thread> threads = new java.util.ArrayList<Thread>();
long start, end;
double avg1 = 0, avg2 = 0;
for (int x = 0; x < 1000; ++x) {
threads.clear();
for (int i = 0; i < 8; ++i) { threads.add(new Thread(new ThreadSyncLock())); }
start = System.currentTimeMillis();
for (int i = 0; i < 8; ++i) { threads.get(i).start(); }
for (int i = 0; i < 8; ++i) { threads.get(i).join(); }
end = System.currentTimeMillis();
avg1 += ((end - start) / 1000f);
l = 0;
threads.clear();
for (int i = 0; i < 8; ++i) { threads.add(new Thread(new ThreadSyncFn())); }
start = System.currentTimeMillis();
for (int i = 0; i < 8; ++i) { threads.get(i).start(); }
for (int i = 0; i < 8; ++i) { threads.get(i).join(); }
end = System.currentTimeMillis();
avg2 += ((end - start) / 1000f);
l = 0;
}
System.out.format("avg1: %f s\navg2: %f s\n", (avg1/1000), (avg2/1000));
l = 0;
} catch (Throwable t) {
System.out.println(t.toString());
}
}
}
Hope that can help add some clarity.
Upvotes: 3
Reputation: 10653
On the contrary synchronized method in practice should be much slower than a synchronized block as synchronized method would be making more code sequential.
However if both contain the same amount of code then there shouldn't be much difference in the performance which is supported by test below.
Supporting classes
public interface TestMethod {
public void test(double[] array);
public String getName();
}
public class TestSynchronizedBlock implements TestMethod{
private static final Object lock = new Object();
public synchronized void test(double[] arr) {
synchronized (lock) {
double sum = 0;
for(double d : arr) {
for(double d1 : arr) {
sum += d*d1;
}
}
//System.out.print(sum + " ");
}
}
@Override
public String getName() {
return getClass().getName();
}
}
public class TestSynchronizedMethod implements TestMethod {
public synchronized void test(double[] arr) {
double sum = 0;
for(double d : arr) {
for(double d1 : arr) {
sum += d*d1;
}
}
//System.out.print(sum + " ");
}
@Override
public String getName() {
return getClass().getName();
}
}
Main Class
import java.util.Random;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicLong;
public class TestSynchronizedMain {
public static void main(String[] args) {
TestSynchronizedMain main = new TestSynchronizedMain();
TestMethod testMethod = null;
Random rand = new Random();
double[] arr = new double[10000];
for(int j = 0; j < arr.length; j++) {
arr[j] = rand.nextDouble() * 10000;
}
/*testMethod = new TestSynchronizedBlock();
main.testSynchronized(testMethod, arr);*/
testMethod = new TestSynchronizedMethod();
main.testSynchronized(testMethod, arr);
}
public void testSynchronized(final TestMethod testMethod, double[] arr) {
System.out.println("Testing " + testMethod.getName());
ExecutorService executor = Executors.newCachedThreadPool();
AtomicLong time = new AtomicLong();
AtomicLong startCounter = new AtomicLong();
AtomicLong endCounter = new AtomicLong();
for (int i = 0; i < 100; i++) {
executor.submit(new Runnable() {
@Override
public void run() {
// System.out.println("Started");
startCounter.incrementAndGet();
long startTime = System.currentTimeMillis();
testMethod.test(arr);
long endTime = System.currentTimeMillis();
long delta = endTime - startTime;
//System.out.print(delta + " ");
time.addAndGet(delta);
endCounter.incrementAndGet();
}
});
}
executor.shutdown();
try {
executor.awaitTermination(Long.MAX_VALUE, TimeUnit.SECONDS);
System.out.println("time taken = " + (time.get() / 1000.0) + " : starts = " + startCounter.get() + " : ends = " + endCounter);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
Main Output in multiple runs
1. Testing TestSynchronizedBlock
time taken = 537.974 : starts = 100 : ends = 100
Testing TestSynchronizedMethod
time taken = 537.052 : starts = 100 : ends = 100
2. Testing TestSynchronizedBlock
time taken = 535.983 : starts = 100 : ends = 100
Testing TestSynchronizedMethod
time taken = 537.534 : starts = 100 : ends = 100
3. Testing TestSynchronizedBlock
time taken = 553.964 : starts = 100 : ends = 100
Testing TestSynchronizedMethod
time taken = 552.352 : starts = 100 : ends = 100
Note: Test was done on windows 8, 64 bit, i7 machine. Actual time is not important but the relative value is.
Upvotes: 0
Reputation: 21153
The number of instructions is really not all that different considering that your synchronized block has a goto which negates 6 or so instructions after it.
It really boils down to how best to expose an object across multiple threads of access.
Upvotes: 2