I have read a couple of statements on ConcurrentHashmap from varied sources and wanted to verify if they are indeed so. After the iterator for a ConcurrentHashmap is created, ONLY the removal and update operations by threads are guaranteed to be reflected. Does the iterator refresh its snapshot after an edit/remove? Why would the iterator treat an update/remove any differently than an ADD. ConcurrentHashmap shards its data into segments to reduce writer lock contention. The concurrencyLevel parameter directly specifies the number of shards that are created by the class internally. If we simply use the parameterless constructor, and accept the default configuration, the map will be instantiating the objects needed for 16 shards before you even add your first value… What do we mean by a shard here? Is it a bucket of data within the map - or a copy of the entire map. I understood this to be similar to a page in the database, which can be independently locked fur update. Why would the concurrencyLevel impact memory then?

Reputation: 9829

facts about Concurrenthashmap

I have read a couple of statements on ConcurrentHashmap from varied sources and wanted to verify if they are indeed so.

After the iterator for a ConcurrentHashmap is created, ONLY the removal and update operations by threads are guaranteed to be reflected. Does the iterator refresh its snapshot after an edit/remove? Why would the iterator treat an update/remove any differently than an ADD.
ConcurrentHashmap shards its data into segments to reduce writer lock contention. The concurrencyLevel parameter directly specifies the number of shards that are created by the class internally. If we simply use the parameterless constructor, and accept the default configuration, the map will be instantiating the objects needed for 16 shards before you even add your first value… What do we mean by a shard here? Is it a bucket of data within the map - or a copy of the entire map. I understood this to be similar to a page in the database, which can be independently locked fur update. Why would the concurrencyLevel impact memory then?

Upvotes: 3

Answers (3)

John Vint

Reputation: 40266

Does the iterator refresh its snapshot after an edit/remove?Why would the iterator treat an update/remove any differently than an ADD.

The CHM's Iterator is explained in the API such that

Similarly, Iterators and Enumerations return elements reflecting the state of the hash table at some point at or since the creation of the iterator/enumeration.

Meaning the Iterator returned may or may not reflect changes that occur in the Map while iterating. Imagine if you create the iterator and traverse an entire segment and go to the next segment. After you go to the next segment the first segment you finished traversing had an add or remove done. Well you won't see that and that's ok, it is not violating the API.

As to your second question. Add is implied, there is no difference wrt to visibility between add and remove.

Why would the concurrencylevel impact memory then?

Memory has been an issue with ConcurrentHashMap since it was released. Each level of concurrency will by default create a single Segment. This segment has a HashEntry table and is also a Reentrant lock (so all of the necessities for that as well).

Java 8 is release CHMv8 which actually solves this issue.

You can read more on the memory here, specifically:

Whilst doing a memory profile, JVisualVM showed that the top culprit was the ConcurrentHashMap.Segment class. The default number of segments per ConcurrentHashMap is 16. The HashEntry tables within the segments would probably be tiny, but each Segment is a ReentrantLock. Each ReentrantLock contains a Sync, in this case a NonFairSync, which is a subclass of Sync and then AbstractQueuedSynchronizer. Each of these contains a queue of nodes that maintain state of what is happening with your threads. It is used when fairness is determined. This queue and the nodes use a lot of memory.

Upvotes: 2

Zim-Zam O'Pootertoot

Reputation: 18158

If you are referring to the entrySet enumeration, this will (eventually) reflect both add and remove operations performed on the parent object. However, the enumeration itself does not support add operations.
Sharding in this case is essentially a hashtable of hashtables. Let's say that the ConcurrentHashMap's backing array contains 1024 entries. Without sharding this means that an object's hash value will map to an integer between [0, 1023]. With sharding this means that the backing array contains sixteen backing arrays of 64 entries, i.e. the [0, 1023] backing array is now an array from [0, 63], another array from [64, 127], etc. Let's say you're modifying an object that hashes to 100 - without sharding you would be locking the entire [0, 1023] backing array, but with sharding you're only locking the [64, 127] sub-array, allowing additional threads to modify the other shards. The more shards you have the greater the concurrency of the ConcurrentHashMap, however the more shards you have the more memory you need to maintain them. (This is not a multiplicative effect, however, where using 16 shards would multiply the total amount of memory used by 16. Instead, it's an additive effect; let's say that it takes 64 bytes to maintain each shard's data structure, so using 16 shards adds 1024 bytes to the data structure, and using 64 shards adds 4096 bytes to the data structure.)

Upvotes: 1

Joel

Reputation: 5674

Remember that a hash table is just an array at it's core, with some cleverness to allow you to use non-integer keys and still get constant time accesses.

With respect to an add, adding an element may cause the hash table to grow, thus creating a new underlying array and reordering the elements of the table. The iterator for the hash will be pointing at the old array presumably, so that it can keep its place.
Generally, sharding in this context means that the map is split into some number of subarrays, so that you are unlikely to hit more than one at a time. It is actually better for most instances if you specify a lower number of shards, 2 is usually sufficient unless you're running on a server with lots of contention on that table. See also:

http://ria101.wordpress.com/2011/12/12/concurrenthashmap-avoid-a-common-misuse/

Upvotes: 0

facts about Concurrenthashmap

Answers (3)

Related Questions