Marnix
Marnix

Reputation: 3

Birthday paradox

I want to simulate the birthday paradox in java. For some reason my output(probability) keeps getting very close to 1 e.g. simulation(10)->0,9268. In start you can see the probabilities my simulations should be close to. I have been looking for a mistake in my code for quite a while now so i hope any of you might be able to help me. I've looked up other codes of the birthday paradox but none of them seem able to help me with my strange output. p.s. you can ignore the //TODO, ill fix that once i get the code up and running. Thanks is advance!

static final int DAYS_IN_YEAR = 365;
static final int NUMBER_OF_SIMULATIONS = 500; 

LCG random;
HashSet<Integer> birthdaySet = new HashSet<Integer>();

BirthdayProblem2(){
    birthdaySet.clear();
    random = new LCG(); //random numbers between 0 and 1
}

int generateBirthday(){ //generates random birthday
    return (int) Math.round(Math.random()*DAYS_IN_YEAR); //TODO use LGC
}

double iteration(int numberOfStudents){ //one iteration from the simulation
    int sameBirthdays = 0;

    for (int i = 0; i < numberOfStudents; i++){
        int bd = generateBirthday();

        if (birthdaySet.contains(bd)){
            sameBirthdays++;
        } else {
            birthdaySet.add(bd);
        }           
    }
    return (double) sameBirthdays/numberOfStudents; //probability of two students having the same birthday
}

void simulation(int numberOfStudents){
    double probabilitySum = 0;
    for (int i = 0; i < NUMBER_OF_SIMULATIONS; i++){
        probabilitySum += iteration(numberOfStudents);
    }
    System.out.printf("For n=%d -> P=%.4f \n", numberOfStudents, probabilitySum/NUMBER_OF_SIMULATIONS);
}

private void start() {

    simulation(10); //should be about 0.1
    simulation(20); //should be about 0.4
    simulation(23); //should be about 0.5
    simulation(35); //should be about 0.8
}

public static void main(String[] argv) {
    new BirthdayProblem2().start();
}

}

Upvotes: 0

Views: 1454

Answers (2)

Andreas
Andreas

Reputation: 159260

You are not clearing the birthdaySet between iterations. Change it from a field to a local variable will prevent errors like that, because why would you need it as a field in the first place?

On a different note, generateBirthday should use Random.nextInt(DAYS_IN_YEAR). An instance of Random is a prime candidate as field. What is that LCG random; line anyway?

UPDATE

Method iteration() should return boolean, for whether or not there are students with same birthday. Number of true values returned divided by number of calls equals probability.

Since there is a relative small number of days in a year, a boolean array will be faster than a Set, so here is updated code:

private static final int DAYS_IN_YEAR = 365;
private static final int NUMBER_OF_SIMULATIONS = 500;
private Random rnd = new Random();
private int generateBirthday() {
    return this.rnd.nextInt(DAYS_IN_YEAR);
}
private boolean iteration(int numberOfStudents) { //one iteration from the simulation
    boolean[] present = new boolean[DAYS_IN_YEAR];
    for (int i = 0; i < numberOfStudents; i++) {
        int birthday = generateBirthday();
        if (present[birthday])
            return true;
        present[birthday] = true;
    }
    return false;
}
void simulation(int numberOfStudents){
    int count = 0;
    for (int i = 0; i < NUMBER_OF_SIMULATIONS; i++)
        if (iteration(numberOfStudents))
            count++;
    System.out.printf("For n=%d -> P=%.4f%n", numberOfStudents, (double)count/NUMBER_OF_SIMULATIONS);
}

Sample Output

For n=10 -> P=0.1340
For n=20 -> P=0.4120
For n=23 -> P=0.5080
For n=35 -> P=0.8160
For n=10 -> P=0.1200
For n=20 -> P=0.4120
For n=23 -> P=0.5260
For n=35 -> P=0.8140
For n=10 -> P=0.1260
For n=20 -> P=0.3920
For n=23 -> P=0.5080
For n=35 -> P=0.7920

Increasing NUMBER_OF_SIMULATIONS to 5_000_000:

For n=10 -> P=0.1167
For n=20 -> P=0.4113
For n=23 -> P=0.5075
For n=35 -> P=0.8145
For n=10 -> P=0.1168
For n=20 -> P=0.4113
For n=23 -> P=0.5072
For n=35 -> P=0.8142

Upvotes: 4

Work of Artiz
Work of Artiz

Reputation: 1100

As andreas already states birthdaySet should be cleared between iterations or the set will contain data from the previous simulation

However there's another fatal flaw in your design, to see it look at the wikipedia page:

In probability theory, the birthday problem or birthday paradox concerns the probability that, in a set of n randomly chosen people, some pair of them will have the same birthday.

It's the chance that there exists at least 2 people with the same birthday. That means that as soon as 2 are equal 1 should be returned.

I've summed it up in the code below

import java.util.HashSet;
import java.util.Random;

public class BirthdayProblem2 {
    static final int DAYS_IN_YEAR = 365;
    static final int NUMBER_OF_SIMULATIONS = 500;

    Random random;
    HashSet<Integer> birthdaySet = new HashSet<Integer>();

    BirthdayProblem2() { random = new Random(); }

    int generateBirthday() { return random.nextInt(DAYS_IN_YEAR); }

    double iteration(int numberOfStudents) { //one iteration from the simulation
        birthdaySet.clear();

        for (int i = 0; i < numberOfStudents; i++) {
            int bd = generateBirthday();

            if (birthdaySet.contains(bd)) {
                return 1.0;
            }

            birthdaySet.add(bd);
        }

        return 0.0;
    }

    void simulation(int numberOfStudents) {
        double probabilitySum = 0;

        for (int i = 0; i < NUMBER_OF_SIMULATIONS; i++) {
            probabilitySum += iteration(numberOfStudents);
        }

        System.out.printf("For n = %d -> P = %.4f \n", numberOfStudents, probabilitySum / NUMBER_OF_SIMULATIONS);
    }

    private void start() {
        simulation(10); //should be about 0.1
        simulation(20); //should be about 0.4
        simulation(23); //should be about 0.5
        simulation(35); //should be about 0.8
    }

    public static void main(String... whatever) { new BirthdayProblem2().start(); }
}

Upvotes: 1

Related Questions