cumin
cumin

Reputation: 491

using replicate() to roll 100 dice 10,000 times; unexpected results

In an EdX R stats class, we are asked to look at the proportion of times a '6' is rolled in a set of 100 die rolls. Then we are asked to roll 100 dice 10,000 times, to see the standard deviation of the difference in means from the 100-die rolls.

The results of the 100-die rolls are as expected; around 0.1703 or so (1/6 = 0.1666667)

But when I load up replicate() to throw sets of 100 dice 10,000 times to see 10,000 means, the results are not what I expect. I don't see any values outside the range of a z-score = 2:

set.seed(1)
# get mean of 100 dice rolls
mean100dice <- function(){
   n=100
   x <- replicate(n, sample(1:6, n, replace=TRUE), simplify='vector')
   mean(x==6)
}
mean100dice() #these come out as expected

means10k <- replicate(10000, mean100dice(),simplify='vector')
p = 1/6
z = (means10k - p) / sqrt(p*(1-p)/n)
mean(z > 2)       ## I expect this to be > 0
range(means10k)   ## sanity check

> mean(z > 2)
[1] 0
> range(means10k)
[1] 0.1522 0.1806

Upvotes: 1

Views: 922

Answers (2)

Richie Cotton
Richie Cotton

Reputation: 121077

At a guess, you set n <- 100 instead of n <- 10000 when calculating z.

It's a good idea to provide explicit variable names, so you don't mixed up. For example, you need to distinguish n_dice_rolls and n_replicates.


Incidentally, your code for calculating the mean of 100 dice rolls is not correct.

sample(1:6, n, replace=TRUE) rolls n dice; you don't need to call replicate() as well. I think you want something like this.

roll_nd6 <- function(n_dice) {
  sample(1:6, n_dice, replace = TRUE)
}
get_fraction_of_sixes_from_rolling_nd6 <- function(n_dice) {
  mean(roll_nd6(n_dice) == 6L)
}
monte_carlo_simulate_get_fraction_of_sixes <- function(n_replications, n_dice) {
  replicate(
    n_replications, 
    get_fraction_of_sixes_from_rolling_nd6(n_dice),
    simplify = "vector"
  )
}
calc_z_score <- function(actual_p, expected_p) {
  (actual_p - expected_p) / 
    sqrt(expected_p * (1 - expected_p) / length(actual_p))
}
actual_fraction_of_sixes <- monte_carlo_simulate_get_fraction_of_sixes(10000, 100)
z_scores <- calc_z_score(actual_fraction_of_sixes, 1 / 6)

Upvotes: 1

Christoph Wolk
Christoph Wolk

Reputation: 1758

You have a mistake in mean100dice: You sample 100 dice, and replicate that 100 times, so it's actually not the average of a 100 dice, but of 100*100 = 10,000 dice. Of course, the mean of that is going to be much closer to p on average.

Upvotes: 0

Related Questions