Reputation: 1823
I'm doing a hash table for store elements in this range: 2000000-20000000 of values.
examples: 17664658-8,7587458-8,7338375-4,5741259-2.....
In a sample of 100000 elements the number of collisions is about 23939 and in a sample of 1000000 elements the number of collisions is about 439870. I don't know much about hash tables, but this numbers of collisions are not a little high?
I read that in a controlled range of numbers you can have a good hash function fairly uniform , but do not know how or where to start , any advice ?
this is my hash fuction.
int hash(char* clave,int m) { //m is the size of the table (about the double of the elements to be stored)
int number=0,i=0;
///
while(isdigit(clave[i])){ //i don't use the last 2 characters.
number+=(clave[i++]-'0');
if(isdigit(clave[i]))
number*=10;
}
/// mutiplication method
float dis,R;
R=0.6106154;
dis = R*(number) - floor(R*(number));
int result = (int)(dis*m);
return result;
}
Upvotes: 2
Views: 3616
Reputation: 24417
No, the number of collisions is not too high, in fact it's about what you'd expect. The formula for the expected number of collisions in a hash table with a uniform, random hash function and m buckets and n insertions is:
n - m * (1 - ((m-1)/m)^n)
For your case:
m = 178144
n = 100000
Plugging the numbers in gives:
100000 - 178144 * (1 - ((178144-1)/178144) ^ 100000)
= 23476.674
and the observed number of collisions is 23939. So there is nothing wrong with your hash function.
Upvotes: 4