Reputation: 777
What is the probability of collision if a new hash code is generated by added 2 other hash codes in Java
Eg:
Integer reportHashCode = reportFields.hashCode() + reportId.hashCode();
Let's assume Java's hash code is 32bits and we can ignore normal collision in the hash code itself.
Upvotes: 3
Views: 1420
Reputation: 12019
How about we find out? The below program will simulate this for you. Note that the two addends for the sum are generated at random, so both have approximately the full integer range of probability. In reality the two hash codes that you sum might not have a flat distribution over the entire integer space. The program could be adjusted to simulate that.
package hashcode;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Map;
import java.util.Random;
import java.util.Set;
public class HashCode {
// Number of test cases
private static final int TEST_CASES = 10_000_000;
public static void main(String[] args) {
// Random number generator
Random rand = new Random();
// Map from integers (result hash codes) to a list of addend pairs that formed those hash codes
Map<Integer, Set<Pair>> hashCodesToComposites = new HashMap<>();
// Number of collissions
int collisions = 0;
// Running simulations
for (int i = 0; i < TEST_CASES; ++i) {
if (TEST_CASES / 4 == i) {
System.out.println("25 %");
}
if (TEST_CASES / 2 == i) {
System.out.println("50 %");
}
if ((TEST_CASES * 3) / 4 == i) {
System.out.println("75 %");
}
// Generating addends as random integers
int first = rand.nextInt();
int second = rand.nextInt();
// The pair; its hash code is the sum of the above
Pair pair = new Pair(first, second);
// Did it occur before?
if (hashCodesToComposites.containsKey(pair.hashCode())) {
// Getting the set of addend pairs that created this hash code
Set<Pair> composites = hashCodesToComposites.get(pair.hashCode());
// Checking if by any chance the two random numbers happened to be the same (almost negligible)
if (!composites.contains(pair)) {
// Actual collision from different numbers
collisions++;
// Adding to the set of composites
composites.add(pair);
} // Same numbers; doesn't count as collision
} else {
// First occurrence of this hash code
Set<Pair> composites = new HashSet<>();
composites.add(pair);
hashCodesToComposites.put(pair.hashCode(), composites);
}
}
// Results
System.out.println("Test cases: " + TEST_CASES);
System.out.println("Collisions: " + collisions);
System.out.println("Probability: " + ((double) collisions / (double) TEST_CASES));
}
private static class Pair {
final int first;
final int second;
final int hashCode;
Pair(int first, int second) {
this.first = first;
this.second = second;
hashCode = first + second;
}
@Override
public int hashCode() {
return hashCode;
}
@Override
public boolean equals(Object obj) {
if (this == obj) {
return true;
}
final Pair other = (Pair) obj;
return (this.first == other.first && this.second == other.second) || (this.first == other.second && this.second == other.first);
}
}
}
The outcome is usually around 0.00115. This means there's roughly a 0.115 % chance of collisions. I've run the below to find out what the odds are for collisions between just random integers.
package hashcode;
import java.util.HashSet;
import java.util.Random;
import java.util.Set;
public class HashCode2 {
// Number of test cases
private static final int TEST_CASES = 10_000_000;
public static void main(String[] args) {
// Random number generator
Random rand = new Random();
Set<Integer> hashCodes = new HashSet<>();
// Number of collissions
int collisions = 0;
// Running simulations
for (int i = 0; i < TEST_CASES; ++i) {
if (TEST_CASES / 4 == i) {
System.out.println("25 %");
}
if (TEST_CASES / 2 == i) {
System.out.println("50 %");
}
if ((TEST_CASES * 3) / 4 == i) {
System.out.println("75 %");
}
int next = rand.nextInt();
if (hashCodes.contains(next)) {
collisions++;
} else {
hashCodes.add(next);
}
}
// Results
System.out.println("Test cases: " + TEST_CASES);
System.out.println("Collisions: " + collisions);
System.out.println("Probability: " + ((double) collisions / (double) TEST_CASES));
}
}
The probability is actually about the same. It is only slightly lower, but still rounds to 0.115 %. Finally, I tried the first program again, but with an xor in the hashCode method of Pair
instead of a sum. The result? Pretty much the same thing again.
So in the end, you can expect very close to the same collision rate as random integers for a sum of two hash codes and an xor, provided both hash codes being summed/xor'ed have a good distribution.
Upvotes: 1
Reputation: 121028
I would XOR
here instead of addition, cause xor has a 50-50% of distribution of 1
and 0
.
Upvotes: 3