Reputation: 3872
I have a "simple" 4 class example that reliably shows unexpected behavior from java synchronization on multiple machines. As you can read below, given the contract of the java sychronized
keyword, Broke Synchronization
should never be printed from the class TestBuffer.
Here are the 4 classes that will reproduce the issue (at least for me). I'm not interested in how to fix this broken example, but rather why it breaks in the first place.
And here is the output I get when I run it:
java -cp . SyncTest
Before Adding
Creating a TestBuffer
Before Remove
Broke Synchronization
1365192
Broke Synchronization
1365193
Broke Synchronization
1365194
Broke Synchronization
1365195
Broke Synchronization
1365196
Done
UPDATE: @Gray has the simplest example that breaks thus far. His example can be found here: Strange JRC Race Condition
Based on the feedback I've gotten from others, it looks like the issue may occur on Java 64-bit 1.6.0_20-1.6.0_31 (unsure about newer 1.6.0's) on Windows and OSX. Nobody has been able to reproduce the issue on Java 7. It may also require a multi-core machine to reproduce the issue.
ORIGINAL QUESTION:
I have a class which provides the following methods:
I've reduced the problem down to the 2 functions below, both of which are in the same object and they're both synchronized
. Unless I am mistaken, "Broke Synchronization" should never be printed because insideGetBuffer
should always be set back to false before remove
can be entered. However, in my application it is printing "Broke Synchronization" when I have 1 thread calling remove repeatedly while the other calls getBuffer repeatedly. The symptom is that I get a ConcurrentModificationException
.
Very strange race condition which looks like a JRE issue
This was confirmed as a bug in Java by Sun. It is apparently fixed (unknowingly?) in jdk7u4, but they have not backported the fix to jdk6. Bug ID: 7176993
Upvotes: 37
Views: 5680
Reputation: 116918
So according to the code that you've posted, you would never get Broke Synchronization
printed unless getBuffer()
throws an exception between the true
and false
setting. See a better pattern below.
Edit:
I took @Luke's code and whittled it down to this pastebin class. As I see it, @Luke is hitting a JRE synchronization bug. I know that's hard to believe but I've been looking at the code and I just can't see the problem.
Since you mention ConcurrentModificationException
, I suspect that getBuffer()
is throwing it when it iterates across the list
. The code that you posted should never throw a ConcurrentModificationException
because of the synchronization but I suspect that some additional code is calling add
or remove
that is not synchronized, or you are removing while you are iterating across the list
. The only way you can modify a un-synchronized collection while you are iterating across it is through the Iterator.remove()
method:
Iterator<Object> iterator = list.iterator();
while (iterator.hasNext()) {
...
// it is ok to remove from the list this way while iterating
iterator.remove();
}
To protect your flag, be sure to use try/finally when you are setting a critical boolean like this. Then any exception would restore the insideGetBuffer
appropriately:
synchronized public Object getBuffer() {
insideGetBuffer = true;
try {
int i=0;
for(Object item : list) {
i++;
}
} finally {
insideGetBuffer = false;
}
return null;
}
Also, it is a better pattern to synchronize around a particular object instead of using method synchronization. If you are trying to protect the list
, then adding synchronization around that list each time would be better.n
synchronized (list) {
list.remove();
}
You can also turn your list into a synchronized list which you wouldn't have to synchronize
on each time:
List<Object> list = Collections.synchronizedList(new ArrayList<Object>());
Upvotes: 10
Reputation: 1377
I had a similar problem before. The error is that you did not declare some field as volatile
. This keyword is used to indicate that a field will be modified by different threads, and thus it cannot be cached. Instead, all the writes and the reads MUST go to the "real" memory location of the field.
For more information, just google for "Java Memory Model"
Albeit most of the readers focus on class TestBuffer
, I think that the problem might be somewhere else (e.g., have you tried to add syncronization on class Controller? Or make volatile its fields?).
ps. notice that different java VMs might use different optimization in different platforms, and thus the synchronization issues might appear more often on a platform than another one. The only way to be safe is to be compliant with the Java's specifications, and file a bug if a VM does not respect it.
Upvotes: 0
Reputation: 3815
I think you are indeed looking at a JVM bug in the OSR. Using the simplified program from @Gray (slight modifications to print an error message) and some options to mess with/print JIT compilation, you can see what is going on with the JIT. And, you can use some options to control that to a degree that can suppress the issue, which lends a lot of evidence to this being a JVM bug.
Running as:
java -XX:+PrintCompilation -XX:CompileThreshold=10000 phil.StrangeRaceConditionTest
you can get an error condition (like others about 80% of the runs) and the compilation print somewhat like:
68 1 java.lang.String::hashCode (64 bytes)
97 2 sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (553 bytes)
104 3 java.math.BigInteger::mulAdd (81 bytes)
106 4 java.math.BigInteger::multiplyToLen (219 bytes)
111 5 java.math.BigInteger::addOne (77 bytes)
113 6 java.math.BigInteger::squareToLen (172 bytes)
114 7 java.math.BigInteger::primitiveLeftShift (79 bytes)
116 1% java.math.BigInteger::multiplyToLen @ 138 (219 bytes)
121 8 java.math.BigInteger::montReduce (99 bytes)
126 9 sun.security.provider.SHA::implCompress (491 bytes)
138 10 java.lang.String::charAt (33 bytes)
139 11 java.util.ArrayList::ensureCapacity (58 bytes)
139 12 java.util.ArrayList::add (29 bytes)
139 2% phil.StrangeRaceConditionTest$Buffer::<init> @ 38 (62 bytes)
158 13 java.util.HashMap::indexFor (6 bytes)
159 14 java.util.HashMap::hash (23 bytes)
159 15 java.util.HashMap::get (79 bytes)
159 16 java.lang.Integer::valueOf (32 bytes)
168 17 s phil.StrangeRaceConditionTest::getBuffer (66 bytes)
168 18 s phil.StrangeRaceConditionTest::remove (10 bytes)
171 19 s phil.StrangeRaceConditionTest$Buffer::remove (34 bytes)
172 3% phil.StrangeRaceConditionTest::strangeRaceConditionTest @ 36 (76 bytes)
ERRORS //my little change
219 15 made not entrant java.util.HashMap::get (79 bytes)
There are three OSR replacements (the ones with the % annotation on the compile ID). My guess is that it is the third one, which is the loop calling remove(), that is responsible for the error. This can be excluded from JIT via a .hotspot_compiler file located in the working directory with the following contents:
exclude phil/StrangeRaceConditionTest strangeRaceConditionTest
When you run the program again, you get this output:
CompilerOracle: exclude phil/StrangeRaceConditionTest.strangeRaceConditionTest
73 1 java.lang.String::hashCode (64 bytes)
104 2 sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (553 bytes)
110 3 java.math.BigInteger::mulAdd (81 bytes)
113 4 java.math.BigInteger::multiplyToLen (219 bytes)
118 5 java.math.BigInteger::addOne (77 bytes)
120 6 java.math.BigInteger::squareToLen (172 bytes)
121 7 java.math.BigInteger::primitiveLeftShift (79 bytes)
123 1% java.math.BigInteger::multiplyToLen @ 138 (219 bytes)
128 8 java.math.BigInteger::montReduce (99 bytes)
133 9 sun.security.provider.SHA::implCompress (491 bytes)
145 10 java.lang.String::charAt (33 bytes)
145 11 java.util.ArrayList::ensureCapacity (58 bytes)
146 12 java.util.ArrayList::add (29 bytes)
146 2% phil.StrangeRaceConditionTest$Buffer::<init> @ 38 (62 bytes)
165 13 java.util.HashMap::indexFor (6 bytes)
165 14 java.util.HashMap::hash (23 bytes)
165 15 java.util.HashMap::get (79 bytes)
166 16 java.lang.Integer::valueOf (32 bytes)
174 17 s phil.StrangeRaceConditionTest::getBuffer (66 bytes)
174 18 s phil.StrangeRaceConditionTest::remove (10 bytes)
### Excluding compile: phil.StrangeRaceConditionTest::strangeRaceConditionTest
177 19 s phil.StrangeRaceConditionTest$Buffer::remove (34 bytes)
324 15 made not entrant java.util.HashMap::get (79 bytes)
and the problem does not appear (at least not in the repeated attempts that I've made).
Also, if you change the JVM options a bit, you can cause the problem to go away. Using either of the following I cannot get the problem to appear.
java -XX:+PrintCompilation -XX:CompileThreshold=100000 phil.StrangeRaceConditionTest
java -XX:+PrintCompilation -XX:FreqInlineSize=1 phil.StrangeRaceConditionTest
Interestingly, the compilation output for both of these still show the OSR for the remove loop. My guess (and it is a big one) is that delaying the JIT via the compilation threshold or changing the FreqInlineSize cause changes to the OSR processing in these cases that bypass a bug that you are otherwise hitting.
See here for info on the JVM options.
See here and here for information on the output of -XX:+PrintCompilation and how to mess with what the JIT does.
Upvotes: 17
Reputation: 1519
'getBuffer' function in the Controller class is creating this issue. If two threads enter at the same time to the following 'if' condition for the first time, then controller will end up creating two buffer objects. add function is called on the first object and remove will be called on the second object.
if (colorToBufferMap.containsKey(defaultColor)) {
When two threads (add and remove threads) enter at the same time (when the buffer was not yet added to colorToBufferMap), above if statement will return false and both the threads will enter the else part and create two buffers, since buffer is a local variable these two threads will receive two different instance of buffer as part of return statement. However, only the last one created will be stored in the global variable 'colorToBufferMap'.
Above problematic line is part of getBuffer function
public TestBuffer getBuffer() {
TestBuffer buffer = null;
if (colorToBufferMap.containsKey(defaultColor)) {
buffer = colorToBufferMap.get(defaultColor);
} else {
buffer = new TestBuffer();
colorToBufferMap.put(defaultColor, buffer);
}
return buffer;
}
Synchronizing the 'getBuffer' function in Controller class will fix this issue.
Upvotes: 2
Reputation: 1392
I don't believe this is a bug in the JVM.
My first suspicion was that it was some sort of operation reordering that the compiler is doing (on my machine, it works fine in a debugger, but sync fails when running) but
I can't tell you why, but I very strongly suspect that something is giving up the lock on TestBuffer that is implicit in declaring getBuffer() and remove(...) synchronized.
For example, replace them with this:
public void getBuffer() {
synchronized (this) {
this.insideGetBuffer = true;
try {
int i = 0;
for (Object item : this.list) {
if (item != null) {
i++;
}
}
} finally {
this.insideGetBuffer = false;
}
}
}
public void remove(final Object item) {
synchronized (this) {
// fails if this is called while getBuffer is running
if (this.insideGetBuffer) {
System.out.println("Broke Synchronization ");
System.out.println(item);
}
}
}
And you still have your sync error. But pick something else to log on, eg:
private Object lock = new Object();
public void getBuffer() {
synchronized (this.lock) {
this.insideGetBuffer = true;
try {
int i = 0;
for (Object item : this.list) {
if (item != null) {
i++;
}
}
} finally {
this.insideGetBuffer = false;
}
}
}
public void remove(final Object item) {
synchronized (this.lock) {
// fails if this is called while getBuffer is running
if (this.insideGetBuffer) {
System.out.println("Broke Synchronization ");
System.out.println(item);
}
}
}
And everything works as expected.
Now, you can simulate giving up the lock by adding:
this.lock.wait(1);
in the for loop of getBuffer() and you'll start failing again.
I remain stumped on what is giving up the lock, but in general it might be a better idea to use explicit synchronization on protected locks than the synchronization operator.
Upvotes: 1
Reputation: 40266
Based on that code there are only two ways that "Broke Synchronization" will print.
insideGetBuffer
is being changed by another thread outside of synchronized block.Without those two there can't be a way that code you listed will be printing "Broke Synchronization" & the ConcurrentModificationException
. Can you give a small snippet of code that can be run to prove what you are saying?
Update:
I went through the example Luke posted and I am seeing odd behaviors on Java 1.6_24-64 bit Windows. The same instance of TestBuffer and the value of the insideGetBuffer
is 'alternating' inside the remove method. Note the field is not updated outside a synchronized region. There is only one TestBuffer instance but let's assume they aren't - insideGetBuffer
would never be set to true (so it must be the same instance).
synchronized public void remove(Object item) {
boolean b = insideGetBuffer;
if(insideGetBuffer){
System.out.println("Broke Synchronization : " + b + " - " + insideGetBuffer);
}
}
Sometimes it prints Broke Synchronization : true - false
I am working on getting the assembler to run on Windows 64 bit Java.
Upvotes: 4
Reputation: 533820
Can you try this code which is a self contained test?
public static class TestBuffer {
private final List<Object> list = new ArrayList<Object>();
private boolean insideGetBuffer = false;
public TestBuffer() {
System.out.println("Creating a TestBuffer");
}
synchronized public void add(Object item) {
list.add(item);
}
synchronized public void remove(Object item) {
if (insideGetBuffer) {
System.out.println("Broke Synchronization ");
System.out.println(item);
}
list.remove(item);
}
synchronized public void getBuffer() {
insideGetBuffer = true;
// System.out.println("getBuffer.");
try {
int count = 0;
for (int i = 0, listSize = list.size(); i < listSize; i++) {
if (list.get(i) != null)
count++;
}
} finally {
// System.out.println(".getBuffer");
insideGetBuffer = false;
}
}
}
public static void main(String... args) throws IOException {
final TestBuffer tb = new TestBuffer();
ExecutorService service = Executors.newCachedThreadPool();
final AtomicLong count = new AtomicLong();
for (int i = 0; i < 16; i++) {
final int finalI = i;
service.submit(new Runnable() {
@Override
public void run() {
while (true) {
for (int j = 0; j < 1000000; j++) {
tb.add(finalI);
tb.getBuffer();
tb.remove(finalI);
}
System.out.printf("%d,: %,d%n", finalI, count.addAndGet(1000000));
}
}
});
}
}
prints
Creating a TestBuffer
11,: 1,000,000
2,: 2,000,000
... many deleted ...
2,: 100,000,000
1,: 101,000,000
Looking at your stack trace in more detail.
Caused by: java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextEntry(Unknown Source)
at java.util.HashMap$KeyIterator.next(Unknown Source)
at <removed>.getBuffer(<removed>.java:62)
You can see that you are accessing the key set of a HashMap, not a list. This is important because the key set is a view on the underlying map. This means you need to ensure that every access to this map is also protected by the same lock. e.g. say you have a setter like
Collection list;
public void setList(Collection list) { this.list = list; }
// somewhere else
Map map = new HashMap();
obj.setList(map.keySet());
// "list" is accessed in another thread which is locked by this thread does this
map.put("hello", "world");
// now an Iterator in another thread on list is invalid.
Upvotes: 2
Reputation: 7886
If access to list and insideGetBuffer is fully contained in that code, the code looks certainly thread safe and I do not see a possibility "Broke synchronization" can be printed, barring a JVM bug.
Can you double check all possible access to your member variables (list and insideGetBuffer)? Possibilities include if list was passed onto you through constructor (which your code doesn't show) or these variables are protected variables so subclasses can change them.
Another possibility is access via reflection.
Upvotes: 1
Reputation: 2242
Edit: Answer valid only when two different Object instances are used in calling the methods repeatedly.
The scenario: You have two synchronized method. One for removing an entity and another for accessing. The problem comes when 1 thread is inside the remove method and another thread is in the getBuffer method and sets the insideGetBuffer=true.
As you found out you need to put synchronization on the list because both these methods work on you list.
Upvotes: 1
Reputation: 692121
A ConcurrentModificationException, most of the time, is not caused by concurrent threads. It's caused by the modification of the collection while it's being iterated:
for (Object item : list) {
if (someCondition) {
list.remove(item);
}
}
The above code would cause a ConcurrentModificationException if someCondition is true. While iterating, the collection can only be modified through the iterator's methods:
for (Iterator<Object> it = list.iterator(); it.hasNext(); ) {
Object item = it.next();
if (someCondition) {
it.remove();
}
}
I suspect that this is what happens in your real code. The posted code is fine.
Upvotes: 2