Reputation: 335
Setup:
The question is complex form of a classic probability question:
70 colored balls are placed in an urn, 10 for each of the seven rainbow colors.
What is the expected number of distinct colors in 20 randomly picked balls?
My solution is python's itertools library:
combos = itertools.combinations(urn, 20)
,
print sum([1 for x in combos])
(where urn is a list of the 70 balls in the urn).
I can unpack the iterator up to a length of combinations(urn, 8)
past that my computer can't handle it.
Note: I know this wouldn't give me the answer, this is only the road block in my script, in other words if this worked my script would work.
Question: How could I find the expected colors accurately, without the worlds fastest super computer? Is my way even computationally possible?
Upvotes: 3
Views: 2672
Reputation: 3316
Since a couple of people have asked to see the mathematical solution, I'll give it. This is one of the Project Euler problems that can be done in a reasonable amount of time with pencil and paper. The answer is
7(1 - (60 choose 20)/(70 choose 20))
To get this write X, the count of colors present, as a sum X0+X1+X2+...+X6, where Xi is 1 if the ith color is present, and 0 if it is not present.
E(X)
= E(X0+X1+...+X6)
= E(X0) + E(X1) + ... + E(X6) by linearity of expectation
= 7E(X0) by symmetry
= 7 * probability that a particular color is present
= 7 * (1- probability that a particular color is absent)
= 7 * (1 - (# ways to pick 20 avoiding a color)/(# ways to pick 20))
= 7 * (1 - (60 choose 20)/(70 choose 20))
Expectation is always linear. So, when you are asked to find the average value of some random quantity, it often helps to try to rewrite the quantity as a sum of simpler pieces such as indicator (0-1) random variables.
This does not say how to make the OP's approach work. Although there is a direct mathematical solution, it is good to know how to iterate through the cases in an organized and practicable fashion. This could help if you next wanted a more complicated function of the set of colors present than the count. Duffymo's answer suggested something that I'll make more explicit:
You can break up the ways to draw 20 calls from 70 into categories indexed by the counts of colors. For example, the index (5,5,10,0,0,0,0) means we drew 5 of the first color, 5 of the second color, 10 of the third color, and none of the other colors.
The set of possible indices is contained in the collection of 7-tuples of nonnegative integers with sum 20. Some of these are impossible, such as (11,9,0,0,0,0,0) by the problem's assumption that there are only 10 balls of each color, but we can deal with that. The set of 7-tuples of nonnegative numbers adding up to 20 has size (26 choose 6)=230230, and it has a natural correspondence with the ways of choosing 6 dividers among 26 spaces for dividers or objects. So, if you have a way to iterate through the 6 element subsets of a 26 element set, you can convert these to iterate through all indices.
You still have to weight the cases by the counts of the ways to draw 20 balls from 70 to get that case. The weight of (a0,a1,a2,...,a6) is (10 choose a0)(10 choose a1)...*(10 choose a6). This handles the case of impossible indices gracefully, since 10 choose 11 is 0 so the product is 0.
So, if you didn't know about the mathematical solution by the linearity of expectation, you could iterate through 230230 cases and compute a weighted average of the number of nonzero coordinates of the index vector, weighted by a product of small binomial terms.
Upvotes: 14
Reputation: 308763
Wouldn't it just be combinations with repetition?
http://www.mathsisfun.com/combinatorics/combinations-permutations.html
Upvotes: 1
Reputation: 23753
Upvotes: -2