Reputation: 407
I have a list of words
count=100
list = ['apple','orange','mango']
for the count above using random function is it possible to select 40% of the time apple, 30% of the time orange and 30% of the time mango?
for ex:
for the count=100, 40 times apple, 30 times orange and 30 times mango.
this select has to happen randomly
Upvotes: 3
Views: 1534
Reputation: 9621
Based on an answer to the question about generating discrete random variables with specified weights, you can use numpy.random.choice
to get 20 times faster code than with random.choice
:
from numpy.random import choice
sample = choice(['apple','orange','mango'], p=[0.4, 0.3, 0.3], size=1000000)
from collections import Counter
print(Counter(sample))
Outputs:
Counter({'apple': 399778, 'orange': 300317, 'mango': 299905})
Not to mention that it is actually easier than "to build a list in the required proportions and then shuffle it".
Also, shuffle would always produce exactly 40% apples, 30% orange and 30% mango, which is not the same as saying "produce a sample of million fruits according to a discrete probability distribution". The latter is what both choice
solutions do (and the bisect
too). As can be seen above, there is about 40% apples, etc., when using numpy
.
Upvotes: 4
Reputation: 226704
The easiest way is to build a list in the required proportions and then shuffle it.
>>> import random
>>> result = ['apple'] * 40 + ['orange'] * 30 + ['mango'] * 30
>>> random.shuffle(result)
Edit for the new requirement that the count is really 1,000,000:
>>> count = 1000000
>>> pool = ['apple'] * 4 + ['orange'] * 3 + ['mango'] * 3
>>> for i in xrange(count):
print random.choice(pool)
A slower but more general alternative approach is to bisect a cumulative probability distribution:
>>> import bisect
>>> choices = ['apple', 'orange', 'mango']
>>> cum_prob_dist = [0.4, 0.7]
>>> for i in xrange(count):
print choices[bisect.bisect(cum_prob_dist, random.random())]
Upvotes: 3