John
John

Reputation: 23

Trying to Solve Monty Hall in Python

I'm trying to understand this solution of the Monty Hall problem, I understand most of the code, but am stuck on two pieces.

Below is the code, but specifically I'm stuck on these two parts

result[bad] = np.random.randint(0,3, bad.sum())

and the entire switch_guess function.

If anyone could explain in plain English for me that would be awesome.

#Simulates picking a prize door
def simulate_prizedoor(nsim):
    return np.random.randint(0,3,(nsim))

#Simulates the doors guessed
def simulate_guesses(nsim):
    return np.zeros(nsim, dtype=np.int)

#Simulates the "game host" showing whats behind a door
def goat_door(prize_doors, guesses):
    result = np.random.randint(0,3, prize_doors.size)
    while True:
        bad = (result == prize_doors) | (result == guesses)
        if not bad.any():
            return result
    result[bad] = np.random.randint(0,3, bad.sum())

#Used to change your guess
def switch_guess(guesses, goat_doors):
    result = np.zeros(guesses.size)
    switch = {(0, 1): 2, (0, 2): 1, (1, 0): 2, (1, 2): 1, (2, 0): 1, (2, 1): 0}
    for i in [0,1,2]:
        #print "i = ", i
        for j in [0,1,2]:
            #print "j = ", j
            mask = (guesses == i) & (goat_doors == j)
            #print "mask = ", mask
            if not mask.any():
                continue
            result = np.where(mask, np.ones_like(result) * switch[(i, j)], result)
    return result

#Calculates the win percentage
def win_percentage(guesses, prizedoors):
    return 100 * (guesses == prizedoors).mean()

#The code to pull everything together
nsim = 10000

#keep guesses
print "Win percentage when keeping original door"
print win_percentage(simulate_prizedoor(nsim), simulate_guesses(nsim))

#switch
pd = simulate_prizedoor(nsim)
guess = simulate_guesses(nsim)
goats = goat_door(pd, guess)
guess = switch_guess(guess, goats)
print "Win percentage when switching doors"
print win_percentage(pd, guess)

Upvotes: 1

Views: 1244

Answers (2)

Evan
Evan

Reputation: 1

It also confused me, until 5 mins ago when I finally figured it out. Since the first question has been solved, I will only talk about the second one.

The intuition goes like this : given a sequence of (guesses, goatdoors),in the (i,j) loop, there are always some simulation (e.g., simulation[0] and simulation[5]) that 'hit' by the (i,j), that is the say, the 0th and 5th simulation have guess i and goatdoor j.

Variable mask record 0 and 5 in this example. Then result in 0th and 5th can be decided, because in these simulation, the only possible door to switch to is determined by i and j. So np.where refreshes result in these simulation, leave other simulations unchanged.

Intuition is above. You need to know how np.where work if you want to know what I'm talking about. Good luck.

Upvotes: 0

abarnert
abarnert

Reputation: 365707

… specifically I'm stuck on these two parts

result[bad] = np.random.randint(0,3, bad.sum())

Let's break this down into pieces. It may help to reduce that 10000 to something small, like 5, so you can print out the values (either with print calls, or in the debugger) and see what's going on.

When we start this function, prize_doors is going to have 5 random values from 0 to 2, like 2 2 0 1 2, and guesses will have 5 values, all 0, like 0 0 0 0 0. result will therefore start off with 5 random values from 0 to 2, like 0 2 2 0 1.

Each first time through the loop, bad will be a list of 5 bool values, which are each True if the corresponding value in result matches the corresponding value in either prize_doors or guesses. So, in this example, True True False True False, because guess #1 matches prize_doors, and guesses #0 and #3 match goats.

Unfortunately, we're just going to go around that loop forever, because there's nothing inside the loop that modifies result, and therefore bad is going to be the same forever, and doing the same check forever is always going to return the same values.


But if you indent that result[bad] = … line so it's inside the loop, that changes everything. So, let's assume that's what you were supposed to do, and you just copied it wrong.

When treated as numbers, True and False have values 1 and 0, respectively. So, bad.sum() is a count of how many matches there were in bad—in this case, 3.

So, np.random.randint(0, 3, bad.sum()) picks 3 random values from 0 to 2, let's say 1 0 1.

Now, result[bad] selects all of the elements of result for which the corresponding value in bad is True, so in this example it's result[0], result[1], and result[3].

So we end up assigning that 1 0 1 to those three selected locations, so result is now 1 0 2 1 1.

So, next time through the loop, bad is now True False False False False. We've still got at least one True value, so we run that result[bad] = line again. This time, bad.sum() is 1, so we pick 1 random value, let's say 0, and we then assign that 1 value to result[0], so result is now 0 0 2 1 1.

The next time through, bad is now False False False False False, so bad.any() is False, so we're done.

In other words, each time through, we take all the values that don't match either the prize door or the goat door, and pick a new door for them, until finally there are no such values.

Upvotes: 2

Related Questions