Reputation: 20379
Where is my mistake in the following code/reasoning.
If I have n
items and I want to count the number of combinations (i.e. order does not matter) of draws of size k
with replacement I can use the binomial coefficient:
choose(n + k - 1, k)
When I want to count the number of possibilities where element i
is not part of the draw I would thus use
choose(n + k - 1 - 1, k)
(Basically I have one option less to draw from).
E.g. with n=3, k= 3
I have the following 10
(choose(3 + 3 - 1, 3)
) draws:
# AAA, AAB, ABB, BBB, BBC, BCC, CCC, CCA, CAA, ABC
The ones where element 'A'
(say) does not appear, are the following 4
(choose(3 + 3 - 1 - 1, 3)
):
# BBB, BBC, BCC, CCC
So far so good. I can thus calculate the probability that I have a draw with n = k
where element i
is not appearing:
freeFromCounts <- function(n) choose(2 * n - 2, n)
totalCounts <- function(n) choose(2 * n - 1, n)
ratio <- function(n) freeFromCounts(n) / totalCounts(n) ## (n - 1) / (2 * n - 1)
So here is my problem if I simulate the draws (with sample(.)
) and repeat that n.rep
times, I would expect to see about n.rep * ratio(n)
draws where element 1
(say) is not present. This is, however, not the case. Where is my mistake?
sim <- function(n, n.rep = 10000, x0 = 1) {
sum(replicate(n.rep, {
s <- sample(n, n, TRUE)
all(s != x0)
})) / n.rep
}
set.seed(12)
sim(10, 1e6) # [1] 0.348278
ratio(10) # [1] 0.4736842
Upvotes: 2
Views: 64
Reputation: 20379
A colleague found the solution. sample
samples not unordered but ordered, i.e. it make a difference between [1, 2, 3]
and [3, 2, 1]
. With this knowledge the figures are the same: 9 ^ 10 / 10 ^ 10 = 0.3486784
.
So to do the simulation for the original problem, you have to come up with a variant of sample
which does unordered sampling. Idea is following the famous derivation of the formula for unordered sampling with replacement (cf. e.g. http://mathworld.wolfram.com/Multichoose.html)
sampleUnordered <- function(n, k) {
aS <- rep(0, n + k -1)
aS[!seq_along(aS) %in% sample(n + k - 1, k)] <- seq.int(n-1)
rl <- rle(aS)
zeros <- which(rl$values == 0)
good <- rl$lengths[zeros]
nr <- c(rl$values, n)[zeros + 1]
rep(nr, good)
}
sim <- function(n, n.rep = 10000, x0 = 1) {
sum(replicate(n.rep, {
s <- sampleUnordered(n, n)
all(s != x0)
})) / n.rep
}
set.seed(12)
sim(10, 1e6) # [1] 0.473234
Upvotes: 1