Reputation: 1499
I have a piece of code that can be executed by multiple threads that needs to perform an I/O-bound operation in order to initialize a shared resource that is stored in a ConcurrentMap
. I need to make this code thread safe and avoid unnecessary calls to initialize the shared resource. Here's the buggy code:
private ConcurrentMap<String, Resource> map;
// .....
String key = "somekey";
Resource resource;
if (map.containsKey(key)) {
resource = map.get(key);
} else {
resource = getResource(key); // I/O-bound, expensive operation
map.put(key, resource);
}
With the above code, multiple threads may check the ConcurrentMap
and see that the resource isn't there, and all attempt to call getResource()
which is expensive. In order to ensure only a single initialization of the shared resource and to make the code efficient once the resource has been initialized, I want to do something like this:
String key = "somekey";
Resource resource;
if (!map.containsKey(key)) {
synchronized (map) {
if (!map.containsKey(key)) {
resource = getResource(key);
map.put(key, resource);
}
}
}
Is this a safe version of double checked locking? It seems to me that since the checks are called on ConcurrentMap
, it behaves like a shared resource that is declared to be volatile
and thus prevents any of the "partial initialization" problems that may happen.
Upvotes: 12
Views: 4555
Reputation: 99
Yes, your double checked locking version is thread safe as long as you are using only ConcurrentHashMap
in multi-threaded environment. As I understand your question, you want to make your method completely thread safe(No race conditions) and lazily loading of the Resource object per key. Things start getting more challenging when you want lazy loading in multi threaded environment. Lets analyze each version of the method.
Approach-1
public synchronized Resource getResource(String key) {
Resource resource = map.get(key);
if (resource == null) {
resource = expensiveGetResourceOperation(key);
map.put(key, resource);
}
return resource;
}
Analysis:- The above approach takes implicit lock of the object in which this method is defined(let this class named as ResourceHolder.java). If multiple threads calls this method(with same key) on different object then they 'll be calling it simultaneously so raising race conditions. If you are sure that multiple threads 'll only call this method always on same instance(I assume this is the case) then yes, it is thread safe But this approach has several issues as it is not scalable and 'll degrade the performance significantly because synchronized keyword can't make differentiate between reading and writing thread and only one thread get into the critical section of the code and will block reading and writing on the map simultaneously which defeats the purpose of using ConcurrentHashMap
.
Approach-2
public Resource getResource(String key) {
Resource resource = map.get(key);
if (resource == null) {
synchronized (map) {
resource = map.get(key);
if (resource == null) {
resource = expensiveGetResourceOperation(key);
map.put(key, resource);
}
}
}
return resource;
}
Analysis:- This Double checked locking approach may improve overall performance, security and thread safety of the method. Lock is only associated with Map instance(private lock object) instead of the intrinsic lock of the object(ResourceHolder
) itself. Multiple threads can read from the Map simultaneously so read operations are very fast in non-blocking way if key already exist. But If key doesn't exist and multiple threads try to create the resource then only one thread 'll enter into the synchronized block and create the resource and release the lock once come out from the block. Other threads 'll get the cached data afterwards so no more unnecessary calls to expensiveGetResourceOperation
method But there is still performance issue when key doesn't exist because it will synchronize the entire Map and 'll not allow threads to put the Resource object in Map even if they hold different keys. If multiple threads comes with different keys and those keys do not exist in Map yet then only one writer thread can update the Map and others 'll be blocked. Fine-grained locking: We're locking only the specific map object (which is an instance of ConcurrentMap). This is efficient if other parts of the code don't need to access the map at the same time because it only affects access to the map and not the rest of the class's methods. But If other threads need to lock map for different reasons (like other synchronized operations on the map), this can lead to lock contention. Other threads will be blocked even if they want to access different parts of the map concurrently.
Approach-3
public Resource getResource(String key) {
/* equivalent code below
Resource r = expensiveGetResourceOperation(key);
Resource res = map.putIfAbsent(key, r);
return res == null ? r : res;
*/
map.putIfAbsent(key, expensiveGetResourceOperation(key));
return map.get(key);
}
Analysis:- If we closely look at your requirement then we can say that you are going to do operation in this terminology- put-if-absent-orElseCachedValue
. As ConcurrentHashMap
prevents only data corruption of internal structure But doesn't prevent race conditions. As you are already using ConcurrentHashMap
so no data corruption But how then we can avoid race condition. answer is performing these operations atomically. Methods putIfAbsent
is check-if-absent-then-set
method which is an atomic operation, preventing race conditions and ensuring data consistency. This means that no other thread can modify the map during the putIfAbsent
operation for the same key. So far, this guarantees atomicity for each individual putIfAbsent
call. The putIfAbsent
operation itself is atomic but does not guarantee that the key's value won't be computed more than once in a multi-threaded scenario and leading to redundant computation and possible data inconsistency if the operation have side effects or the results could vary. Method expensiveGetResourceOperation
will always be invoked and value is calculated irrespective of whether key is absent or not But value should be put or not depends on putIfAbsent
semantics means it won't cause data inconsistency.
Approach-4(efficient approach)
public Resource getResource(String key) {
// Ensure only one thread computes the value for the key if it's absent
return map.computeIfAbsent(key, k -> expensiveGetResourceOperation(k));
}
Analysis:- Similar to putIfAbsent
semantic but computes the value using mappingFunction on demand(lazily evaluation of lambda) only if key is absent so no redundant computation. The entire method invocation is performed atomically, so the function is applied at most once per key. Efficiency: No unnecessary locking of the entire map, and no redundant key existence checks.
Other points:-
OP's question- It seems to me that since the checks are called on ConcurrentMap, it behaves like a shared resource that is declared to be volatile and thus prevents any of the "partial initialization" problems that may happen. ----> Map shouldn't be volatile
, but it should be final
. If it's not final then variable can be reassigned/changed.
Upvotes: 0
Reputation: 768
The verdict is in. I timed 3 different solutions in nanosecond accuracy, since after all the initial question was about performance:
Fully synching the function on a regular HashMap:
synchronized (map) {
Object result = map.get(key);
if (result == null) {
result = new Object();
map.put(key, result);
}
return result;
}
first invocation: 15,000 nanoseconds, subsequent invocations: 700 nanoseconds
Using the double check lock with a ConcurrentHashMap:
if (!map.containsKey(key)) {
synchronized (map) {
if (!map.containsKey(key)) {
map.put(key, new Object());
}
}
}
return map.get(key);
first invocation: 15,000 nanoseconds, subsequent invocations: 1500 nanoseconds
A different flavor of double checked ConcurrentHashMap:
Object result = map.get(key);
if (result == null) {
synchronized (map) {
if (!map.containsKey(key)) {
result = new Object();
map.put(key, result);
} else {
result = map.get(key);
}
}
}
return result;
first invocation: 15,000 nanoseconds, subsequent invocations: 1000 nanoseconds
You can see that the biggest cost was on the first invocation, but was similar for all 3. Subsequent invocations were the fastest on the regular HashMap with method sync like user237815 suggested but only by 300 NANO seocnds. And after all we are talking about NANO seconds here which means a BILLIONTH of a second.
Upvotes: 0
Reputation: 15799
In general, double-checked locking is safe if the variable you're synchronizing on is marked volatile. But you're better off synchronizing the entire function:
public synchronized Resource getResource(String key) {
Resource resource = map.get(key);
if (resource == null) {
resource = expensiveGetResourceOperation(key);
map.put(key, resource);
}
return resource;
}
The performance hit will be tiny, and you'll be certain that there will be no sync problems.
Edit:
This is actually faster than the alternatives, because you won't have to do two calls to the map in most cases. The only extra operation is the null check, and the cost of that is close to zero.
Second edit:
Also, you don't have to use ConcurrentMap. A regular HashMap will do it. Faster still.
Upvotes: 1
Reputation: 8899
If you can use external libraries, take a look at Guava's MapMaker.makeComputingMap(). It's tailor-made for what you're trying to do.
Upvotes: 4
Reputation: 45433
yes it' safe.
If map.containsKey(key)
is true, according to doc, map.put(key, resource)
happens before it. Therefore getResource(key)
happens before resource = map.get(key)
, everything is safe and sound.
Upvotes: 3
Reputation: 425003
No need for that - ConcurrentMap supports this as with its special atomic putIfAbsent method.
Don't reinvent the wheel: Always use the API where possible.
Upvotes: 0
Reputation: 8204
Why not use the putIfAbsent() method on ConcurrentMap?
if(!map.containsKey(key)){
map.putIfAbsent(key, getResource(key));
}
Conceivably you might call getResource() more than once, but it won't happen a bunch of times. Simpler code is less likely to bite you.
Upvotes: 2