brainydexter
brainydexter

Reputation: 20356

how can i improve the implementation of my own hash map

For learning purposes, I'm writing my own implementation of the hash map. I'm using separate chaining with list heads as my theme.

This is what the structure would look like:

| 0   | ---> | 11 | ---> | 33 | ---> | -- | ---> | 121 | ---> | TAIL |
| 1   | ---> | 12 | ---> | 34 | ---> | -- | ---> | 122 | ---> | TAIL |
| -   |
| -   |
| -   |
| D-1 | ---> | -- | ---> | -- | ---> | -- | ---> | -- | ---> | TAIL |

It is an array of linked lists where,

D = size of the array,

| 11 | = element with key; 11 AND elements are inserted in sorted manner

Algorithm:

void Insert(key, value):
 int bucket = hash_fn(key); // key % D, for now
 // accessing this bucket or array-index in array is O(1)
 // insert in linked list at the right position
 LL[bucket]->insert(new object(key, value))

bool Lookup(key):
 int bucket = hash_fn(key); // key % D, for now
 // search for key in LL[bucket]

Concern: If a lot of elements get mapped to the same bucket, search will not be O(1), in fact, it may tend towards O(n).

How can I improve this ?

Upvotes: 0

Views: 240

Answers (3)

Erik P.
Erik P.

Reputation: 1627

You can't. That's why it's crucial to prevent this from happening by using a good hash function that spreads items evenly over the buckets, and making sure you use enough buckets.

If you're willing to stray from the idea of a hash table with linked lists for the buckets, you can try putting some other data structure in the buckets - for example some sort of self-balancing tree, like a red-black or AVL one to get O(log(m)) behaviour, where m is the maximum number of entries per bucket. But that will not actually make you happy. Simply using a good hash function will.

Upvotes: 2

Serdalis
Serdalis

Reputation: 10489

What you can do is Hashing with Chaining which will use a linked list to help avoid collisions within a hash table.
This will allow your lookups remain quite constant, even with many elements mapped to the same hash bucket.

However, with a good enough hash function you would not need to worry about this unless you are expecting the hash table to get close to capacity.

This Wikipedia Article also contains some very good information about this technique.

Upvotes: 0

Ivaylo Strandjev
Ivaylo Strandjev

Reputation: 70929

From wikipedia:

With a good hash function, the average lookup cost is nearly constant as the load factor increases from 0 up to 0.7(about 2/3 full) or so.[citation needed] Beyond that point, the probability of collisions and the cost of handling them increases.

So with a good enough hash function and a reasonably big hash table you should not concern about that.

Upvotes: 0

Related Questions