zzfima
zzfima

Reputation: 1565

What is connection between collision and complexity of CRUD operations in Hash Table?

In book of Aditya Bhargava "Grokking Algorithms: An illustrated guide for programmers and other curious people" i read than worst case complexity can be avoided, if we avoid collision. As i understand, collision - is when hash function returns same value in case of different keys. How it is affects Hash Table complexity in CRUD operations? Thanks

Upvotes: 0

Views: 246

Answers (2)

Tony Delroy
Tony Delroy

Reputation: 106096

i read than worst case complexity can be avoided, if we avoid collision.

That's correct - worst case complexity happens when all the hash values for elements stored in a hash table map on to and collided at the same bucket.

As i understand, collision - is when hash function returns same value in case of different keys.

Ultimately a value is mapped using a hash function to a bucket in the hash table. That said, it's common for that overall conceptual hash function to be implemented as a hash function producing a value in a huge numerical range (e.g. a 32-bit hash between 0 and 2^32-1, or a 64-bit hash between 0 and 2^64-1), then have that value mapped on to a specific bucket based on the current hash table bucket count using the % operator. So, say your hash table has 137 buckets, you might generate a hash value of 139, then say 139 % 137 == 2 and use the third ([2] in an array of buckets). This two step approach makes it easy to use the same hash function (producing 32-bit or 64-bit hashes) regardless of the size of table. If you instead created a hash function that produced numbers between 0 and 136 directly, it wouldn't work at all well for slightly smaller or larger bucket counts.

Returning to your question...

As i understand, collision - is when hash function returns same value in case of different keys.

...for the "32- or 64-bit hash function followed by %" approach I've described above, there are two distinct types of collisions: the 32- or 64-bit hash function itself may produce exactly the same 32- or 64-bit value for distinct values being hashed, or they might produce different values that - after the % operation - never-the-less map to the same bucket in the hash table.

How it is affects Hash Table complexity in CRUD operations?

Hash tables work by probabilistically spreading the values across the buckets. When many values collide at the same bucket, a secondary search mechanism has to be employed to process all the colliding values (and possibly other intermingled values, if you're using Open Addressing to try a sequence of buckets in the hash table, rather than hanging a linked list or binary tree of colliding elements off every bucket). So basically, the worse the collision rate, the further from idealised O(1) complexity you get, though you really only start to affect big-O complexity significantly if you have a particularly bad hash function, in light of the set of values being stored.

Upvotes: 1

Jim Mischel
Jim Mischel

Reputation: 133995

In a hash table implementation that has a good hashing function, and the load factor (number of entries divided by total capacity) is 70% or less, the number of collisions is fairly low and hash lookup is O(1).

If you have a poor hashing function or your load factor starts to increase, then the number of collisions increases. If you have a poor hashing function, then some hash codes will have many collisions and others will have very few. Your average lookup rate might still be close to O(1), but some lookups will take much longer because collision resolution takes a long time. For example, if hash code value 11792 has 10 keys mapped to it, then you potentially have to check 10 different keys before you can return the matching key.

If the hash table is overloaded, with each hash code having approximately the same number of keys mapped to it, then your average lookup rate will be O(k), where k is the average number of collisions per hash code.

Upvotes: 1

Related Questions