Reputation: 105
We had observed a NullPointerException when trying to get a value assoicated with a given key in HashMap.
Following is the example code which I will be using to illustrate the problem.
public class Test {
private Map<String, Integer> employeeNameToAgeMap = new HashMap<String, Integer>();
public int getAge(String employeeName) {
if (!employeeNameToAgeMap.containsKey(employeeName)) {
int age = getAgeFromSomeCustomAPI(employeeName);
employeeNameToAgeMap.put(employeeName, age);
}
return employeeNameToAgeMap.get(employeeName);
}
}
Getting a NullPointerException at the last line of the method which is " return employeeNameToAgeMap.get(employeeName);"
As we can see employeeNameToAgeMap is not null and also the caller is not passing employeeName as null (this we have taken in the calling code itself).
This method would be called from different threads and at a very fast rate ( from some timer tasks which are scheduled to run for every 100ms or so )
The reason for this NullPointerException seems to be that the value(age) being put for the given employee is null, but this is not the case as that custom API method (getAgeFromSomeCustomAPI()) is guranteed to return some age for a given employee, even if it was returning null then exception stacktrace should have shown that corresponding line in the logs than the last line.
My only assumption is that while a thread T1 is trying to populate that cache, T2 came up and for some reason it was able to find that cache is having the employeeName already, but when it tried to get the age, it throwed a NPE. But I'm not 100% convinced that while put() operation is in progress for a given key and value, containsKey() for the same key returned true.
I'm aware that this code needs to be enhanced to cater synchronization issues ( by using ConcurrentHashMap or locks), but looking forward to know the real cause for this issue.
I would really appreciate the help.
Upvotes: 7
Views: 11875
Reputation: 14269
This is an excellent example of why something that is not considered thread-safe should not be used in a threaded environment, even if you cannot imagine what could possibly go wrong. The problem here is the lack of imagination.
There are two things that can possibly go wrong here:
The Java execution re-ordering is your enemy.
As defined with the keyword volatile the order of execution of code can be re-arranged by the JVM for whatever purpose, as long as the result is the same in a single-threaded environment. So the key-value pair could be added first, before value is actually set, causing simultaneous get
calls to return the intermediate null
value.
Some mechanic inside the hash-map implementation is using a lazy-set mechanism, because that turned out to be much faster on the specific implementation. While this is not the case in the code I have seen so far, it tells you that you should not expect something to be as you would code it.
The lesson to be learned: Stick to the docs and only to the docs, because everything else is not defined and therefore might be changed or is already much different to what you would expect.
Upvotes: 2
Reputation: 8708
I believe what you are experiencing is a rehash of the HashMap during your call to
return employeeNameToAgeMap.get(employeeName);
One could believe, that if HashMap#containsKey(key) returns true, it should be guaranteed, that calling HashMap#get(key) should also return a valid value, as long as the key is not removed from the HashMap. This could be argued by the fact, that HashMap#containsKey(key) does indeed check, if the key corresponds to a valid value:
public boolean containsKey(Object key) {
return getEntry(key) != null;
}
But this is a fatal misconception. HashMap#containsKey(key) does only guarantee, that the key has already been associated to some value sometime before it was called. But it does not guarantee, that HashMap#get(key) will also return the corresponding value, if multiple threads are accessing the map. The reason for this discrepancy is, that other threads accessing HashMap#put(key,value) with any key-value pair, may force a rehash of the HashMap, which results in the recreation of the internal hash table. If such a rehash happens during your call to HashMap#get(key), it is possible, that HashMap#get(key) returns null, even though your HashMap previously returned true when calling HashMap#containsKey(key).
If you just want to avoid the NullPointerException, you could do this:
public class Test {
private Map<String, Integer> employeeNameToAgeMap = new HashMap<String, Integer>();
public int getAge(String employeeName) {
final Integer age = employeeNameToAgeMap.get(employeeName);
if (age == null) {
age = getAgeFromSomeCustomAPI(employeeName);
employeeNameToAgeMap.put(employeeName, age);
}
return (int)age;
}
}
This would of course not make your code thread save, but you would no longer get the NullPointerException you are experiencing now.
Upvotes: 4
Reputation: 9651
I'm not 100% convinced that while put() operation is in progress for a given key and value, containsKey() for the same key returned true.
You're correct - without any synchronization, there's no guarantee that a call to put() in one thread will result in another thread returning true for containsKey(). This is true even if the call to put() has completed.
The Java Memory Model allows the memory reads/writes by a thread to be re-ordered. This is often known as out-of-order execution. The result is the internal state of the Map seen by any given thread may be inconsistent, and this can lead to data corruption, crashes, and infinite loops.
For your simple example, it looks like you can simply replace HashMap
with ConcurrentHashMap
, but it's difficult to be sure that's correct without seeing the rest of your program.
I suggest reading Java Concurrency in Practice by Brian Goetz to get a better understanding of the Java Memory Model.
You might also be interested in this blog post, A Beautiful Race Condition, which shows why sharing a HashMap between threads without synchronization can cause unexpected behaviour.
Edit: It's probably worth drawing your attention to this part of the JavaDoc for HashMap:
If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.) This is typically accomplished by synchronizing on some object that naturally encapsulates the map. If no such object exists, the map should be "wrapped" using the Collections.synchronizedMap method. This is best done at creation time, to prevent accidental unsynchronized access to the map:
Map m = Collections.synchronizedMap(new HashMap(...));
Upvotes: 4
Reputation: 328618
employeeNameToAgeMap.get(employeeName);
returns an Integer
. If that Integer
is null, the auto-unboxing to int
necessary to return an int
from your method throws a NPE.
So you should write something like:
Integer result = employeeNameToAgeMap.get(employeeName);
return result == null ? -1 : result;
Alternatively, you could throw an exception, say a EmployeeNotFoundException
.
Or you could also return an 'Integer`, document that the returned value may be null, and let the client handle the null case.
Upvotes: -1