Stenal P Jolly
Stenal P Jolly

Reputation: 777

What are the chances for collision in sum of 2 hash codes?

What is the probability of collision if a new hash code is generated by added 2 other hash codes in Java

Eg:

Integer reportHashCode = reportFields.hashCode() + reportId.hashCode();

Let's assume Java's hash code is 32bits and we can ignore normal collision in the hash code itself.

Upvotes: 3

Views: 1420

Answers (2)

G_H
G_H

Reputation: 12019

How about we find out? The below program will simulate this for you. Note that the two addends for the sum are generated at random, so both have approximately the full integer range of probability. In reality the two hash codes that you sum might not have a flat distribution over the entire integer space. The program could be adjusted to simulate that.

package hashcode;

import java.util.HashMap;
import java.util.HashSet;
import java.util.Map;
import java.util.Random;
import java.util.Set;

public class HashCode {
    // Number of test cases
    private static final int TEST_CASES = 10_000_000;
    public static void main(String[] args) {
        // Random number generator
        Random rand = new Random();
        // Map from integers (result hash codes) to a list of addend pairs that formed those hash codes
        Map<Integer, Set<Pair>> hashCodesToComposites = new HashMap<>();
        // Number of collissions
        int collisions = 0;
        // Running simulations
        for (int i = 0; i < TEST_CASES; ++i) {
            if (TEST_CASES / 4 == i) {
                System.out.println("25 %");
            }
            if (TEST_CASES / 2 == i) {
                System.out.println("50 %");
            }
            if ((TEST_CASES * 3) / 4 == i) {
                System.out.println("75 %");
            }
            // Generating addends as random integers
            int first = rand.nextInt();
            int second = rand.nextInt();
            // The pair; its hash code is the sum of the above
            Pair pair = new Pair(first, second);
            // Did it occur before?
            if (hashCodesToComposites.containsKey(pair.hashCode())) {
                // Getting the set of addend pairs that created this hash code
                Set<Pair> composites = hashCodesToComposites.get(pair.hashCode());
                // Checking if by any chance the two random numbers happened to be the same (almost negligible)
                if (!composites.contains(pair)) {
                    // Actual collision from different numbers
                    collisions++;
                    // Adding to the set of composites
                    composites.add(pair);
                } // Same numbers; doesn't count as collision
            } else {
                // First occurrence of this hash code
                Set<Pair> composites = new HashSet<>();
                composites.add(pair);
                hashCodesToComposites.put(pair.hashCode(), composites);
            }
        }
        // Results
        System.out.println("Test cases: " + TEST_CASES);
        System.out.println("Collisions: " + collisions);
        System.out.println("Probability: " + ((double) collisions / (double) TEST_CASES));
    }
    private static class Pair {
        final int first;
        final int second;
        final int hashCode;
        Pair(int first, int second) {
            this.first = first;
            this.second = second;
            hashCode = first + second;
        }
        @Override
        public int hashCode() {
            return hashCode;
        }
        @Override
        public boolean equals(Object obj) {
            if (this == obj) {
                return true;
            }
            final Pair other = (Pair) obj;
            return (this.first == other.first && this.second == other.second) || (this.first == other.second && this.second == other.first);
        }
    }
}

The outcome is usually around 0.00115. This means there's roughly a 0.115 % chance of collisions. I've run the below to find out what the odds are for collisions between just random integers.

package hashcode;

import java.util.HashSet;
import java.util.Random;
import java.util.Set;

public class HashCode2 {
    // Number of test cases
    private static final int TEST_CASES = 10_000_000;
    public static void main(String[] args) {
        // Random number generator
        Random rand = new Random();
        Set<Integer> hashCodes = new HashSet<>();
        // Number of collissions
        int collisions = 0;
        // Running simulations
        for (int i = 0; i < TEST_CASES; ++i) {
            if (TEST_CASES / 4 == i) {
                System.out.println("25 %");
            }
            if (TEST_CASES / 2 == i) {
                System.out.println("50 %");
            }
            if ((TEST_CASES * 3) / 4 == i) {
                System.out.println("75 %");
            }
            int next = rand.nextInt();
            if (hashCodes.contains(next)) {
                collisions++;
            } else {
                hashCodes.add(next);
            }
        }
        // Results
        System.out.println("Test cases: " + TEST_CASES);
        System.out.println("Collisions: " + collisions);
        System.out.println("Probability: " + ((double) collisions / (double) TEST_CASES));
    }
}

The probability is actually about the same. It is only slightly lower, but still rounds to 0.115 %. Finally, I tried the first program again, but with an xor in the hashCode method of Pair instead of a sum. The result? Pretty much the same thing again.

So in the end, you can expect very close to the same collision rate as random integers for a sum of two hash codes and an xor, provided both hash codes being summed/xor'ed have a good distribution.

Upvotes: 1

Eugene
Eugene

Reputation: 121028

I would XOR here instead of addition, cause xor has a 50-50% of distribution of 1 and 0.

Upvotes: 3

Related Questions