Reputation: 23
So I'm working with a file that has 400 data values, all ints and ranging from 4 to 20,000 in value. I load all these into an array of size 400. There is another empty array of ListNodes of size 600 that I will move the data to, but using a self-written hash code (I'll post it below).
Because each index in the array of length 600 has a ListNode in it, if there are any collisions, then the data value is added to the back of the ListNode. I also have a method that returns the percent of the array that is null. But basically since I'm loading 400 data values into an array of size 600, the least percent of nulls I can have is 33.3% because if there are no collisions, then 400 slots in the array are taken and 200 are null, but this is not the case:
return (num+123456789/(num*9365))%600; //num is the value read from the array of 400
That hashCode has given me my best result of 48.3% nulls and I need it to be below 47% at least. any suggestions or solutions to imporve this hashCode? I would greatly appreciate any help. If you need any more info or details please let me know. Thank you!!!
Upvotes: 0
Views: 625
Reputation: 43738
I did some experiments with random numbers: generate 400 uniformly distributed random numbers in the range [0, 599] and check how many values in that range are not generated. It turns out, on average 51.3 % of the values are not generated. So your 48.3 % is already better than expected. The target of 47 % seems unrealistic unless some form of perfect hashing is used.
If you want to make some experiments on your own, here is the program.
public static void main(String[] args) {
Random r = new Random();
int[] counts = new int[600];
for (int i = 0; i < 400; i++) {
counts[r.nextInt(600)]++;
}
int n = 0;
for (int i = 0; i < 600; i++) {
if (counts[i] == 0) {
n++;
}
}
System.out.println(100.0 * n / 600);
}
Upvotes: 1
Reputation: 2352
I'd use JAVAs implementation of the hashing-algorithm:
Hava a look at open-jdk HashMap
static int hash(int h) {
// This function ensures that hashCodes that differ only by
// constant multiples at each bit position have a bounded
// number of collisions (approximately 8 at default load factor).
h ^= (h >>> 20) ^ (h >>> 12);
return h ^ (h >>> 7) ^ (h >>> 4);
}
Note you have too add a modulo-operation to make sure that the value wouldn't be greater than 600
EDIT 1
>>> is logical shift right
EXAMPLE:
10000000 >>> 2 = 00100000
Upvotes: 0