Reputation: 201
I'm in my first semester of studies and as a part of my comp. science assignment I have to implement a simple hash map using vectors, but I have some problems understanding the concept.
First of all I have to implement a hash function. To avoid collisions I thought it would be better to use double hashing, as follows:
do {
h = (k % m + j*(1+(k % (m-2)));
j++;
} while ( j % m != 0 );
where h is the hash to be returned, k is the key and m is the size of hash_map (and a prime number; they are all of type int).
This was easy, but then I need to be able to insert or remove a pair of key and the corresponding value in the map.
The signature of the two functions should be bool, so I have to return either true or flase, and I'm guessing that I should return true when there is no element at position h in the vector. (But I have no idea why remove should be bool as well).
My problem is what to do when the insert function returns false (i.e. when there is already a key-value pair saved on position h - I implemented this as a function named find). I could obviously move it to the next free place by simply increasing j, but then the hash calculated by my hash function wouldn't tell us anymore at which place a certain key is saved, causing wrong behaviour of remove function.
Is there any good example online, that doesn't use the pre defined STD methods? (My Google behaves wierdly in the past few days and only reutrns me unuseful hits in the local language)
Upvotes: 2
Views: 2486
Reputation: 587
I've been told to move my comment to an answer so here it is. I am presuming your get method takes the value you are looking for an argument.
so what we are going to do is a process called linear probing.
when we insert the value we hash it as normal lets say our hash value is 4
[x,x,x,,,x,x]
as we can see we can simply insert it in:
[x,x,x,x,,x,x]
however if 4 is taken when we insert it we will simply move to the next slot that is empty
[x,x,x,**x**,x,,x,x]
In linear probing if we reach the end we loop back round to the beginning until we find a slot. You shouldn't run out of space as you are using a vector which can allocate extra space when it starts getting near full capacity
this will cause problems when you are searching because the value at 4 may not be at 4 anymore (in this case its at 5). To solve this we do a little bit of a hack. Note that we still get O(1) run time complexity for inserting and retrieval as long as the load balance is below 1.
in our get method instead of returning the value in the array at 4 we are instead going to start looking for our value at 4 if its there we can return it. If not we look at the value at 5 and so on till we find the value.
in psudo code the new stuff looks like this
bool insert(value){
h = hash(value);
while(node[h] != null){
h++;
if( h = node.length){
h = 0;
}
}
node[h] = value;
return true;
}
get
get(value){
h = hash(value);
roundTrip = 0; //used to see if we keep going round the hashmap
while(true){
if(node[h] == value)
return node[h];
h++;
if( h = node.length){
h = 0;
roundTrip++;
}
if(roundTrip > 1){ //we can't find it after going round list once
return -1;
}
}
}
Upvotes: 2