What is connection between collision and complexity of CRUD operations in Hash Table?

Question

In book of Aditya Bhargava "Grokking Algorithms: An illustrated guide for programmers and other curious people" i read than worst case complexity can be avoided, if we avoid collision. As i understand, collision - is when hash function returns same value in case of different keys. How it is affects Hash Table complexity in CRUD operations? Thanks

Tony Delroy · Accepted Answer

i read than worst case complexity can be avoided, if we avoid collision.

That's correct - worst case complexity happens when all the hash values for elements stored in a hash table map on to and collided at the same bucket.

As i understand, collision - is when hash function returns same value in case of different keys.

Ultimately a value is mapped using a hash function to a bucket in the hash table. That said, it's common for that overall conceptual hash function to be implemented as a hash function producing a value in a huge numerical range (e.g. a 32-bit hash between 0 and 2^32-1, or a 64-bit hash between 0 and 2^64-1), then have that value mapped on to a specific bucket based on the current hash table bucket count using the % operator. So, say your hash table has 137 buckets, you might generate a hash value of 139, then say 139 % 137 == 2 and use the third ([2] in an array of buckets). This two step approach makes it easy to use the same hash function (producing 32-bit or 64-bit hashes) regardless of the size of table. If you instead created a hash function that produced numbers between 0 and 136 directly, it wouldn't work at all well for slightly smaller or larger bucket counts.

Returning to your question...

As i understand, collision - is when hash function returns same value in case of different keys.

...for the "32- or 64-bit hash function followed by %" approach I've described above, there are two distinct types of collisions: the 32- or 64-bit hash function itself may produce exactly the same 32- or 64-bit value for distinct values being hashed, or they might produce different values that - after the % operation - never-the-less map to the same bucket in the hash table.

How it is affects Hash Table complexity in CRUD operations?

Hash tables work by probabilistically spreading the values across the buckets. When many values collide at the same bucket, a secondary search mechanism has to be employed to process all the colliding values (and possibly other intermingled values, if you're using Open Addressing to try a sequence of buckets in the hash table, rather than hanging a linked list or binary tree of colliding elements off every bucket). So basically, the worse the collision rate, the further from idealised O(1) complexity you get, though you really only start to affect big-O complexity significantly if you have a particularly bad hash function, in light of the set of values being stored.

What is connection between collision and complexity of CRUD operations in Hash Table?

Answers (2)

Related Questions