Reputation: 41
I don't understand the cum_weights
parameter of random.choices
.
I read that it is:
Weight of previous element + own weight [10, 5, 1] → [10, 15, 16]
So as I understand it, the probability of "cherry"
is 16 and it is the highest. So why is "apple"
more repetitive as a result?
import random
mylist = ["apple", "banana", "cherry"]
print(random.choices(mylist, cum_weights=[10, 5, 1], k=9))
outputs:
['apple', 'apple', 'apple', 'apple', 'apple', 'apple', 'apple', 'apple', 'apple']
Upvotes: 3
Views: 1027
Reputation: 5162
The cum_weights
parameter is not further cummulated. You can look at the implementation of the function here https://github.com/python/cpython/blob/3.9/Lib/random.py#L473
And have a look this line https://github.com/python/cpython/blob/3.9/Lib/random.py#L505
Where the output is being generated. In the bisect
method it tries to identify the position of the index random()*total
in the array cum_weights
. In your case cum_weights
is [10, 5, 1]
. And look at this line https://github.com/python/cpython/blob/3.9/Lib/random.py#L500 to understand how total
is computed.
total = cum_weights[-1] +0.0
That means your total
is always 1.0
as the last value of cum_weights
is 1. So, you are always getting the first index i.e. 0 in your population. So your output will contain all 'apple'. Even if you run the algorithm thousand times you will get only apple
in our output list with your current implementation.
Upvotes: 1
Reputation: 2947
You're mixing up the relative weight and cumulative weight.
The cum_weights=[10, 5, 1]
parameter in the random choices is the cumulative weight itself, it doesn't cumulate further.
Upvotes: 0
Reputation: 54733
Saying weights=[10,5,1]
is the same as saying cum_weights=[10,15,16]
. The cum_weights
values must be in increasing order for it to make sense. What you provided would confuse things. choice
is going to use the maximum value as its random range, so in your case it's going to choose a number from 0 to 9, and because your first 10 values are "apple", it's always going to choose "apple".
Upvotes: 0
Reputation: 120489
When you have relative weights, the cumulative weights look like the sum of those values:
Your cum_weights should be: [10, 15, 16]
mylist = ["apple", "banana", "cherry"]
print(random.choices(mylist, cum_weights=[10, 15, 16], k=14))
['apple', 'banana', 'cherry', 'banana', 'apple', 'banana', 'apple', 'apple', 'banana', 'banana', 'apple', 'banana', 'banana', 'banana']
Upvotes: 1