Reputation: 75

Handling hash collisions when using linear probing

I have read about hashtable and open adrdessing. If you want to insert the keys: 18,32,44 in an hashtable with size 13:

18 gets index 5 (18 modulus 13 = 5)
32 gets index 6 (32 modulus 13 = 6)
44 gets index 5 (44 modulus 13 = 5)

You'll get a collision because there are already something on index 5.

If you use linear probing you'll do hashfunction = (key+i) modulus N where i = 0,1,2.. until you find an empty place in the hashtable. Then 44 will get be inserted at index 7.

What if you delete 32, and then you want to delete 44. You start by looking at hashfunction(44)=5 - that was not 44, then hashfunction(44 + 1) = 6 - that is empty. Then you might think that 44 is gone. How do you mark a place in the hashtable, that the place is not really empty, but does not contain a key, and that you should keep looking for 44 at the next index?

If you then need to insert another key at index 6 then the key just overwrites the "mark" in the hashtable.

What could you use to mark an index - saying here has been an key, but has been deleted - so you continue to look at next index? You can't just write null or 0 because then either you think the key has been deleted (null) or that an key with value 0 has overwritten 44.

Upvotes: 5

Answers (4)

NullUserException

Reputation: 85478

One way to handle hash tables using open addressing is to use state marks: EMPTY, OCCUPIED and DELETED. Note that there's an important distinction between EMPTY, which means the position has never been used and DELETED, which means it was used but got deleted.

When a value gets removed, the slot is marked as DELETED, not EMPTY. When you try to retrieve a value, you'll probe until you find a slot that's mark EMPTY; eg: you consider DELETED slots to be the same as OCCUPIED. Note that insertion can ignore this distinction - you can insert into a DELETED or EMPTY slot.

The question is tagged Java, which is a bit misleading because Java (or at least Oracle's implementation of it) does not use open addressing. Open addressing gets specially problematic when the load factor gets high, which causes hash collisions to occur much more often:

enter image description here

As you can see, there's a dramatic performance drop near the 0.7 mark. Most hashtables get resized once their load factor gets past a certain constant factor. Java for example doubles the size of its HashMap when the load factor gets past 0.75.

Upvotes: 6

Albin

Reputation: 4220

It seems like you are trying to implement your own hash table (in contrast to using the Hashtable or HashMap included in java), so it's more a data structure question than a java question.

That being said, implementing a hash table with open addressing (such as linear probing) is not very efficient when it comes to removing elements. The normal solution is to "pull up" all elements that are in the wrong slot so there won't be any spaces in the probing.

There is some pseudo code describing this quite well at wikipedia:

http://en.wikipedia.org/wiki/Open_addressing

Upvotes: 2

Peter Lawrey

Reputation: 533870

If you use a hash table which uses this approach (which none of the built in hash collections do) you need traverse all the latter keys to see if they need to be moved up (to avoid holes). Some might be for the same hash value and some might be collisions for unrelated hash codes. If you do this you are not left with any holes. For a hash map which is not too full, this shouldn't create much overhead.

Upvotes: 0

Eric Rosenberg

Reputation: 1533

The hash table buckets aren't limited to storing a single value. So if two objects hash to the same location in the table they will both be stored. The collision only means that lookup will be slightly slower because when looking for the value with a key that hashes to a particular location it will need to check each entry to see if it matches

It sounds like you are describing a hash table where you only store a single entry and each index. The only way I can think to do that is to add a field to the structure storing the value that indicates if that position had a collision. Then when doing a lookup you'd check the key, if it was a match you have the value. If not, then you would check to see if there was a collision and then check the next position. On a removal you'd have to leave the collision marker but delete the value and key.

Upvotes: 0

Handling hash collisions when using linear probing

Answers (4)

Related Questions