What should I do if primary or secondary clustering occurs in hash table

I get what primary and secondary clustering are but how to get rid of what way to minimise them properly

Upvotes: 0

Answers (1)

Tony Delroy

Reputation: 106236

how to get rid of what way to minimise them properly

You can use a higher quality hash function to distribute the keys in a less collision-prone fashion. For some scenarios, the best practical hash function achievable has a kind of pseudo-random-but-repeatable placement property. In other cases, you might know something about the keys that lets you create a less collision-prone hash function - for example, you might know that the keys tend to be incrementing numbers, possibly with a few small gaps: in that case, an identity hash function h(n) = n will tend to place values in adjacent buckets, with less chance of collision than if the placements were more random.

In some cases, using prime numbers of buckets helps distribute elements better across the buckets than using a power-of-two bucket count. Basically, bucket counts that are powers of two are effectively masking out the high-order bits of the hash value when mapping onto the buckets: any randomness in the high order bits is discarded instead of helping to create a more uniform distribution across buckets. Still, bitwise masking is faster than a mod calculation on most hardware/CPUs.

You can also reduce the load factor: the ratio of elements to buckets. Clustering effects for hash tables using closed hashing get exponentially worse as the load factor approaches 1 (i.e. every bucket being full).

You could also stop using closed hashing and use separate chaining (maintaining containers of elements colliding at each bucket) instead, which doesn't suffer from primary clustering, but the indirection can lead to more memory usage overheads, indirection, and less optimal use of CPU cache, with consequently lower runtime performance - especially when the elements are small (a few bytes each).

You can also use multiple hash functions to identify successive buckets at which an element may be stored, rather than simple offers as in linear or quadratic probing, which reduces clustering. When you have alternative buckets, you can use techniques to move elements around to reduce the worse areas of clustering - search for robin hood hashing for example.

Upvotes: 0

What should I do if primary or secondary clustering occurs in hash table

Answers (1)

Related Questions