Reputation: 23
I wrote code which reads some words and their meanings from a file and maps them to an array (make hash table). It uses polynomial hash code and a compress method.
My goal is to reduce the collisions as less as possible but I don't know how.
public int hashcode(Entry my){
Object key=my.getKey();
int sum=0 ,z=33;
char[] chars = new char[key.toString().length()];
chars=key.toString().toCharArray();
for(int i=0; i < chars.length; i++){
sum += (chars[i])*Math.pow(z,i);
}
return sum;
}
and this is my compress method (for an array by size 100):
public int compress(int hashcode){
return hashcode%100;
}
Should I change my compress method or there are ways which may help me?
Upvotes: 2
Views: 4638
Reputation: 171
What you seem to be looking for is a perfect hash function, unfortunately, as far as I know, such a hash does not exist :)
Another thing to point out is that the performance of hash functions also varies by the type of result you want to achieve; what I mean is that a hash function might perform excellent for "storing" phone numbers but provide poor results for storing the contact's name.
From a quick look over your code I'd say that you're hash function is over complicated.
First I'd like to point out a problem with your current algorithm: this line 'sum+=(chars[i])*Math.pow(z,i);' will return values that are way beyond the integer range for words that are more than 4-5 characters long (just a guess). You'll probably say it's ok because it will overflow and so on but the truth is it won't because the sum+= syntax actually hides a type cast (try writing it as sum=sum+) and in such cases the sum will have the value of Integer.MAX_VALUE. This is probably why your algorithm is slow right now.
If I were you, for the purpose of a dictionary (which seems to be what you are trying to do) and assuming that Entry#getKey() is of type String, I would probably just go with:
public int hashcode(Entry my) {
return my.getKey().hashCode();
}
If you still want to come up with your own hash function why not go with something more simpler like: [word length + char code of first X letters + char code of the last letter] where you adapt X so the result will fit into an int. Just an idea :)
Upvotes: 2