exsnake
exsnake

Reputation: 1823

Numbers of collision in a hash table

I'm doing a hash table for store elements in this range: 2000000-20000000 of values.

examples: 17664658-8,7587458-8,7338375-4,5741259-2.....

In a sample of 100000 elements the number of collisions is about 23939 and in a sample of 1000000 elements the number of collisions is about 439870. I don't know much about hash tables, but this numbers of collisions are not a little high?

I read that in a controlled range of numbers you can have a good hash function fairly uniform , but do not know how or where to start , any advice ?

this is my hash fuction.

int hash(char* clave,int m) { //m is the size of the table (about the double of the elements to be stored)
        int number=0,i=0;
        /// 
        while(isdigit(clave[i])){ //i don't use the last 2 characters. 
            number+=(clave[i++]-'0');
            if(isdigit(clave[i]))
                number*=10;
        }
        /// mutiplication method
        float dis,R;
        R=0.6106154;
        dis = R*(number) - floor(R*(number));
        int result = (int)(dis*m);
        return result;
    }

Upvotes: 2

Views: 3616

Answers (1)

samgak
samgak

Reputation: 24417

No, the number of collisions is not too high, in fact it's about what you'd expect. The formula for the expected number of collisions in a hash table with a uniform, random hash function and m buckets and n insertions is:

n - m * (1 - ((m-1)/m)^n)

For your case:

m = 178144
n = 100000

Plugging the numbers in gives:

100000 - 178144 * (1 - ((178144-1)/178144) ^ 100000)
= 23476.674

and the observed number of collisions is 23939. So there is nothing wrong with your hash function.

Upvotes: 4

Related Questions