Luke
Luke

Reputation: 3872

Java Synchronization Not Working as Expected

I have a "simple" 4 class example that reliably shows unexpected behavior from java synchronization on multiple machines. As you can read below, given the contract of the java sychronized keyword, Broke Synchronization should never be printed from the class TestBuffer.

Here are the 4 classes that will reproduce the issue (at least for me). I'm not interested in how to fix this broken example, but rather why it breaks in the first place.

Sync Issue - Controller.java

Sync Issue - SyncTest.java

Sync Issue - TestBuffer.java

Sync Issue - Tuple3f.java

And here is the output I get when I run it:

java -cp . SyncTest
Before Adding
Creating a TestBuffer
Before Remove
Broke Synchronization
1365192
Broke Synchronization
1365193
Broke Synchronization
1365194
Broke Synchronization
1365195
Broke Synchronization
1365196
Done

UPDATE: @Gray has the simplest example that breaks thus far. His example can be found here: Strange JRC Race Condition

Based on the feedback I've gotten from others, it looks like the issue may occur on Java 64-bit 1.6.0_20-1.6.0_31 (unsure about newer 1.6.0's) on Windows and OSX. Nobody has been able to reproduce the issue on Java 7. It may also require a multi-core machine to reproduce the issue.

ORIGINAL QUESTION:

I have a class which provides the following methods:

I've reduced the problem down to the 2 functions below, both of which are in the same object and they're both synchronized. Unless I am mistaken, "Broke Synchronization" should never be printed because insideGetBuffer should always be set back to false before remove can be entered. However, in my application it is printing "Broke Synchronization" when I have 1 thread calling remove repeatedly while the other calls getBuffer repeatedly. The symptom is that I get a ConcurrentModificationException.

See Also:

Very strange race condition which looks like a JRE issue

Sun Bug Report:

This was confirmed as a bug in Java by Sun. It is apparently fixed (unknowingly?) in jdk7u4, but they have not backported the fix to jdk6. Bug ID: 7176993

Upvotes: 37

Views: 5680

Answers (10)

Gray
Gray

Reputation: 116918

So according to the code that you've posted, you would never get Broke Synchronization printed unless getBuffer() throws an exception between the true and false setting. See a better pattern below.

Edit:

I took @Luke's code and whittled it down to this pastebin class. As I see it, @Luke is hitting a JRE synchronization bug. I know that's hard to believe but I've been looking at the code and I just can't see the problem.


Since you mention ConcurrentModificationException, I suspect that getBuffer() is throwing it when it iterates across the list. The code that you posted should never throw a ConcurrentModificationException because of the synchronization but I suspect that some additional code is calling add or remove that is not synchronized, or you are removing while you are iterating across the list. The only way you can modify a un-synchronized collection while you are iterating across it is through the Iterator.remove() method:

Iterator<Object> iterator = list.iterator();
while (iterator.hasNext()) {
   ...
   // it is ok to remove from the list this way while iterating
   iterator.remove();
}

To protect your flag, be sure to use try/finally when you are setting a critical boolean like this. Then any exception would restore the insideGetBuffer appropriately:

synchronized public Object getBuffer() {
    insideGetBuffer = true;
    try {
        int i=0;
        for(Object item : list) {
            i++;
        }
    } finally {
        insideGetBuffer = false;
    }
    return null;
}

Also, it is a better pattern to synchronize around a particular object instead of using method synchronization. If you are trying to protect the list, then adding synchronization around that list each time would be better.n

 synchronized (list) {
    list.remove();
 }

You can also turn your list into a synchronized list which you wouldn't have to synchronize on each time:

 List<Object> list = Collections.synchronizedList(new ArrayList<Object>());

Upvotes: 10

Matteo
Matteo

Reputation: 1377

I had a similar problem before. The error is that you did not declare some field as volatile. This keyword is used to indicate that a field will be modified by different threads, and thus it cannot be cached. Instead, all the writes and the reads MUST go to the "real" memory location of the field.

For more information, just google for "Java Memory Model"

Albeit most of the readers focus on class TestBuffer, I think that the problem might be somewhere else (e.g., have you tried to add syncronization on class Controller? Or make volatile its fields?).

ps. notice that different java VMs might use different optimization in different platforms, and thus the synchronization issues might appear more often on a platform than another one. The only way to be safe is to be compliant with the Java's specifications, and file a bug if a VM does not respect it.

Upvotes: 0

philwb
philwb

Reputation: 3815

I think you are indeed looking at a JVM bug in the OSR. Using the simplified program from @Gray (slight modifications to print an error message) and some options to mess with/print JIT compilation, you can see what is going on with the JIT. And, you can use some options to control that to a degree that can suppress the issue, which lends a lot of evidence to this being a JVM bug.

Running as:

java -XX:+PrintCompilation -XX:CompileThreshold=10000 phil.StrangeRaceConditionTest

you can get an error condition (like others about 80% of the runs) and the compilation print somewhat like:

 68   1       java.lang.String::hashCode (64 bytes)
 97   2       sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (553 bytes)
104   3       java.math.BigInteger::mulAdd (81 bytes)
106   4       java.math.BigInteger::multiplyToLen (219 bytes)
111   5       java.math.BigInteger::addOne (77 bytes)
113   6       java.math.BigInteger::squareToLen (172 bytes)
114   7       java.math.BigInteger::primitiveLeftShift (79 bytes)
116   1%      java.math.BigInteger::multiplyToLen @ 138 (219 bytes)
121   8       java.math.BigInteger::montReduce (99 bytes)
126   9       sun.security.provider.SHA::implCompress (491 bytes)
138  10       java.lang.String::charAt (33 bytes)
139  11       java.util.ArrayList::ensureCapacity (58 bytes)
139  12       java.util.ArrayList::add (29 bytes)
139   2%      phil.StrangeRaceConditionTest$Buffer::<init> @ 38 (62 bytes)
158  13       java.util.HashMap::indexFor (6 bytes)
159  14       java.util.HashMap::hash (23 bytes)
159  15       java.util.HashMap::get (79 bytes)
159  16       java.lang.Integer::valueOf (32 bytes)
168  17 s     phil.StrangeRaceConditionTest::getBuffer (66 bytes)
168  18 s     phil.StrangeRaceConditionTest::remove (10 bytes)
171  19 s     phil.StrangeRaceConditionTest$Buffer::remove (34 bytes)
172   3%      phil.StrangeRaceConditionTest::strangeRaceConditionTest @ 36 (76 bytes)
ERRORS //my little change
219  15      made not entrant  java.util.HashMap::get (79 bytes)

There are three OSR replacements (the ones with the % annotation on the compile ID). My guess is that it is the third one, which is the loop calling remove(), that is responsible for the error. This can be excluded from JIT via a .hotspot_compiler file located in the working directory with the following contents:

exclude phil/StrangeRaceConditionTest strangeRaceConditionTest

When you run the program again, you get this output:

CompilerOracle: exclude phil/StrangeRaceConditionTest.strangeRaceConditionTest
 73   1       java.lang.String::hashCode (64 bytes)
104   2       sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (553 bytes)
110   3       java.math.BigInteger::mulAdd (81 bytes)
113   4       java.math.BigInteger::multiplyToLen (219 bytes)
118   5       java.math.BigInteger::addOne (77 bytes)
120   6       java.math.BigInteger::squareToLen (172 bytes)
121   7       java.math.BigInteger::primitiveLeftShift (79 bytes)
123   1%      java.math.BigInteger::multiplyToLen @ 138 (219 bytes)
128   8       java.math.BigInteger::montReduce (99 bytes)
133   9       sun.security.provider.SHA::implCompress (491 bytes)
145  10       java.lang.String::charAt (33 bytes)
145  11       java.util.ArrayList::ensureCapacity (58 bytes)
146  12       java.util.ArrayList::add (29 bytes)
146   2%      phil.StrangeRaceConditionTest$Buffer::<init> @ 38 (62 bytes)
165  13       java.util.HashMap::indexFor (6 bytes)
165  14       java.util.HashMap::hash (23 bytes)
165  15       java.util.HashMap::get (79 bytes)
166  16       java.lang.Integer::valueOf (32 bytes)
174  17 s     phil.StrangeRaceConditionTest::getBuffer (66 bytes)
174  18 s     phil.StrangeRaceConditionTest::remove (10 bytes)
### Excluding compile: phil.StrangeRaceConditionTest::strangeRaceConditionTest
177  19 s     phil.StrangeRaceConditionTest$Buffer::remove (34 bytes)
324  15      made not entrant  java.util.HashMap::get (79 bytes)

and the problem does not appear (at least not in the repeated attempts that I've made).

Also, if you change the JVM options a bit, you can cause the problem to go away. Using either of the following I cannot get the problem to appear.

java -XX:+PrintCompilation -XX:CompileThreshold=100000 phil.StrangeRaceConditionTest
java -XX:+PrintCompilation -XX:FreqInlineSize=1 phil.StrangeRaceConditionTest

Interestingly, the compilation output for both of these still show the OSR for the remove loop. My guess (and it is a big one) is that delaying the JIT via the compilation threshold or changing the FreqInlineSize cause changes to the OSR processing in these cases that bypass a bug that you are otherwise hitting.

See here for info on the JVM options.

See here and here for information on the output of -XX:+PrintCompilation and how to mess with what the JIT does.

Upvotes: 17

sperumal
sperumal

Reputation: 1519

'getBuffer' function in the Controller class is creating this issue. If two threads enter at the same time to the following 'if' condition for the first time, then controller will end up creating two buffer objects. add function is called on the first object and remove will be called on the second object.

if (colorToBufferMap.containsKey(defaultColor)) {

When two threads (add and remove threads) enter at the same time (when the buffer was not yet added to colorToBufferMap), above if statement will return false and both the threads will enter the else part and create two buffers, since buffer is a local variable these two threads will receive two different instance of buffer as part of return statement. However, only the last one created will be stored in the global variable 'colorToBufferMap'.

Above problematic line is part of getBuffer function

public TestBuffer getBuffer() {
    TestBuffer buffer = null;
    if (colorToBufferMap.containsKey(defaultColor)) {
        buffer = colorToBufferMap.get(defaultColor);
    } else {
        buffer = new TestBuffer();
        colorToBufferMap.put(defaultColor, buffer);
    }
    return buffer;
}

Synchronizing the 'getBuffer' function in Controller class will fix this issue.

Upvotes: 2

Mason Bryant
Mason Bryant

Reputation: 1392

I don't believe this is a bug in the JVM.

My first suspicion was that it was some sort of operation reordering that the compiler is doing (on my machine, it works fine in a debugger, but sync fails when running) but

I can't tell you why, but I very strongly suspect that something is giving up the lock on TestBuffer that is implicit in declaring getBuffer() and remove(...) synchronized.

For example, replace them with this:

public void getBuffer() {
    synchronized (this) {
        this.insideGetBuffer = true;
        try {
            int i = 0;
            for (Object item : this.list) {
                if (item != null) {
                    i++;
                }
            }
        } finally {
            this.insideGetBuffer = false;
        }
    }

}

public void remove(final Object item) {
    synchronized (this) {
        // fails if this is called while getBuffer is running
        if (this.insideGetBuffer) {
            System.out.println("Broke Synchronization ");
            System.out.println(item);
        }
    }
}

And you still have your sync error. But pick something else to log on, eg:

private Object lock = new Object();
public void getBuffer() {
    synchronized (this.lock) {
        this.insideGetBuffer = true;
        try {
            int i = 0;
            for (Object item : this.list) {
                if (item != null) {
                    i++;
                }
            }
        } finally {
            this.insideGetBuffer = false;
        }
    }

}

public void remove(final Object item) {
    synchronized (this.lock) {
        // fails if this is called while getBuffer is running
        if (this.insideGetBuffer) {
            System.out.println("Broke Synchronization ");
            System.out.println(item);
        }
    }
}

And everything works as expected.

Now, you can simulate giving up the lock by adding:

this.lock.wait(1);

in the for loop of getBuffer() and you'll start failing again.

I remain stumped on what is giving up the lock, but in general it might be a better idea to use explicit synchronization on protected locks than the synchronization operator.

Upvotes: 1

John Vint
John Vint

Reputation: 40266

Based on that code there are only two ways that "Broke Synchronization" will print.

  1. They are synchronizing on different object (which you say they are not)
  2. The insideGetBuffer is being changed by another thread outside of synchronized block.

Without those two there can't be a way that code you listed will be printing "Broke Synchronization" & the ConcurrentModificationException. Can you give a small snippet of code that can be run to prove what you are saying?

Update:

I went through the example Luke posted and I am seeing odd behaviors on Java 1.6_24-64 bit Windows. The same instance of TestBuffer and the value of the insideGetBuffer is 'alternating' inside the remove method. Note the field is not updated outside a synchronized region. There is only one TestBuffer instance but let's assume they aren't - insideGetBuffer would never be set to true (so it must be the same instance).

    synchronized public void remove(Object item) {

            boolean b = insideGetBuffer;
            if(insideGetBuffer){
                    System.out.println("Broke Synchronization : " +  b + " - " + insideGetBuffer);
            }
    }

Sometimes it prints Broke Synchronization : true - false

I am working on getting the assembler to run on Windows 64 bit Java.

Upvotes: 4

Peter Lawrey
Peter Lawrey

Reputation: 533820

Can you try this code which is a self contained test?

public static class TestBuffer {
    private final List<Object> list = new ArrayList<Object>();
    private boolean insideGetBuffer = false;

    public TestBuffer() {
        System.out.println("Creating a TestBuffer");
    }

    synchronized public void add(Object item) {
        list.add(item);
    }

    synchronized public void remove(Object item) {
        if (insideGetBuffer) {
            System.out.println("Broke Synchronization ");
            System.out.println(item);
        }

        list.remove(item);
    }

    synchronized public void getBuffer() {
        insideGetBuffer = true;
//      System.out.println("getBuffer.");
        try {
            int count = 0;
            for (int i = 0, listSize = list.size(); i < listSize; i++) {
                if (list.get(i) != null)
                    count++;
            }
        } finally {
//          System.out.println(".getBuffer");
            insideGetBuffer = false;
        }
    }
}

public static void main(String... args) throws IOException {
    final TestBuffer tb = new TestBuffer();
    ExecutorService service = Executors.newCachedThreadPool();
    final AtomicLong count = new AtomicLong();
    for (int i = 0; i < 16; i++) {
        final int finalI = i;
        service.submit(new Runnable() {
            @Override
            public void run() {
                while (true) {
                    for (int j = 0; j < 1000000; j++) {
                        tb.add(finalI);
                        tb.getBuffer();
                        tb.remove(finalI);
                    }
                    System.out.printf("%d,: %,d%n", finalI, count.addAndGet(1000000));
                }
            }
        });
    }
}

prints

Creating a TestBuffer
11,: 1,000,000
2,: 2,000,000
... many deleted ...
2,: 100,000,000
1,: 101,000,000

Looking at your stack trace in more detail.

Caused by: java.util.ConcurrentModificationException
    at java.util.HashMap$HashIterator.nextEntry(Unknown Source)
    at java.util.HashMap$KeyIterator.next(Unknown Source)
    at <removed>.getBuffer(<removed>.java:62)

You can see that you are accessing the key set of a HashMap, not a list. This is important because the key set is a view on the underlying map. This means you need to ensure that every access to this map is also protected by the same lock. e.g. say you have a setter like

Collection list;
public void setList(Collection list) { this.list = list; }


// somewhere else
Map map = new HashMap();
obj.setList(map.keySet());

// "list" is accessed in another thread which is locked by this thread does this
map.put("hello", "world");
// now an Iterator in another thread on list is invalid.

Upvotes: 2

sjlee
sjlee

Reputation: 7886

If access to list and insideGetBuffer is fully contained in that code, the code looks certainly thread safe and I do not see a possibility "Broke synchronization" can be printed, barring a JVM bug.

Can you double check all possible access to your member variables (list and insideGetBuffer)? Possibilities include if list was passed onto you through constructor (which your code doesn't show) or these variables are protected variables so subclasses can change them.

Another possibility is access via reflection.

Upvotes: 1

Akhi
Akhi

Reputation: 2242

Edit: Answer valid only when two different Object instances are used in calling the methods repeatedly.

The scenario: You have two synchronized method. One for removing an entity and another for accessing. The problem comes when 1 thread is inside the remove method and another thread is in the getBuffer method and sets the insideGetBuffer=true.

As you found out you need to put synchronization on the list because both these methods work on you list.

Upvotes: 1

JB Nizet
JB Nizet

Reputation: 692121

A ConcurrentModificationException, most of the time, is not caused by concurrent threads. It's caused by the modification of the collection while it's being iterated:

for (Object item : list) {
    if (someCondition) {
         list.remove(item);
    }
}

The above code would cause a ConcurrentModificationException if someCondition is true. While iterating, the collection can only be modified through the iterator's methods:

for (Iterator<Object> it = list.iterator(); it.hasNext(); ) {
    Object item = it.next();
    if (someCondition) {
         it.remove();
    }
}

I suspect that this is what happens in your real code. The posted code is fine.

Upvotes: 2

Related Questions