When to use concurrenthashmap vs hashmap for put and get

In an interview question I was asked to explain a situation when using a concurrenthashmap would be the right way vs using a hashmap. On the board there were two columns t1 and t2 (corresponding to thread1 and thread2), and I was supposed to write a sequence of actions (like map.put(2, 10), map.get(2), etc) that using a concurrenthashmap vs a hashmap would produce the expected result.

I tried to give an example with an iterator but that was not what the interviewer looking for. He was looking for a sequence of put and get operations for thread1 and thread2. He said assume we never iterate and we only have put and get operations.

I looked at other threads on SO and verified my knowledge of thread safety but still I can't think of any example for put and gets producing wrong result with hashmap and correct result with concurrenthashmap. Is there any sequence of puts and gets, or I should have said not possible?

Upvotes: 8

Answers (2)

Erwin Bolwidt

Reputation: 31299

There are many ways in which they can differ - since HashMap is not protected against concurrent access from multiple threads, you could breaks its internal data structure entirely.

However most often you get more benign effects. The below code should put 2000 entries in each map from multiple threads. But for the HashMap, there will be consistently fewer entries than 2000 in the map after the operation, since some of the puts will clash with each other and their result will be lost.

public class BreakingMap {
    public static void testIt(Map<Integer, Integer> map) throws InterruptedException {
        IntStream.range(0, 2000).parallel().forEach(i -> map.put(i, -1));
        System.out.println(map.size());
    }

    public static void main(String[] args) throws InterruptedException {
        testIt(new HashMap<>());
        testIt(new ConcurrentHashMap<>());
    }
}

Upvotes: 6

Matt Timmermans

Reputation: 59253

That's an interesting question.

The correct answer is:

There are few realistic cases where which a sequence of get and put operations on a ConcurrentHashMap will produce the expected result in a multithreaded scenario. Instead of put(), you almost always need to use the atomic compare-and-mutate operations like computeIfAbsent() to do anything useful. One exception is the case when you use the map as a cache and the possibility of having multiple threads compute the same entry is more efficient than blocking while one thread does it... but then do you really need a cache? Not very often.

Just for the record, that would look like this:

Thread1 + Thread2 (they both do the same thing)
-----------------------------------------------

result = map.get(key);
if (result == null) {
    result = somewhat_expensive_function(key)
    map.put(key, result);
}
return result;

On the other hand, using a normal HashMap across two threads when one may be modifying the map while the other is also using it can lead to undefined behavior -- results not consistent with any sequence of operations, null pointer exceptions, or even a permanently corrupted data structure.

If I were asking this question in an interview, what I would be testing is: does the candidate understand that using thread-safe data structures doesn't make his algorithm thread-safe?

Upvotes: 1

When to use concurrenthashmap vs hashmap for put and get

Answers (2)

Related Questions