Lara
Lara

Reputation: 41

Why does "random.choices" return always the same element when passing a list "cum_weights" of decreasing values?

I don't understand the cum_weights parameter of random.choices.

I read that it is:

Weight of previous element + own weight [10, 5, 1] → [10, 15, 16]

So as I understand it, the probability of "cherry" is 16 and it is the highest. So why is "apple" more repetitive as a result?

import random

mylist = ["apple", "banana", "cherry"]
print(random.choices(mylist, cum_weights=[10, 5, 1], k=9))

outputs:

['apple', 'apple', 'apple', 'apple', 'apple', 'apple', 'apple', 'apple', 'apple']

Upvotes: 3

Views: 1027

Answers (4)

Md Johirul Islam
Md Johirul Islam

Reputation: 5162

The cum_weights parameter is not further cummulated. You can look at the implementation of the function here https://github.com/python/cpython/blob/3.9/Lib/random.py#L473 And have a look this line https://github.com/python/cpython/blob/3.9/Lib/random.py#L505 Where the output is being generated. In the bisect method it tries to identify the position of the index random()*total in the array cum_weights. In your case cum_weights is [10, 5, 1]. And look at this line https://github.com/python/cpython/blob/3.9/Lib/random.py#L500 to understand how total is computed. total = cum_weights[-1] +0.0 That means your total is always 1.0 as the last value of cum_weights is 1. So, you are always getting the first index i.e. 0 in your population. So your output will contain all 'apple'. Even if you run the algorithm thousand times you will get only apple in our output list with your current implementation.

Upvotes: 1

devReddit
devReddit

Reputation: 2947

You're mixing up the relative weight and cumulative weight. The cum_weights=[10, 5, 1] parameter in the random choices is the cumulative weight itself, it doesn't cumulate further.

Upvotes: 0

Tim Roberts
Tim Roberts

Reputation: 54733

Saying weights=[10,5,1] is the same as saying cum_weights=[10,15,16]. The cum_weights values must be in increasing order for it to make sense. What you provided would confuse things. choice is going to use the maximum value as its random range, so in your case it's going to choose a number from 0 to 9, and because your first 10 values are "apple", it's always going to choose "apple".

Upvotes: 0

Corralien
Corralien

Reputation: 120489

When you have relative weights, the cumulative weights look like the sum of those values:

Your cum_weights should be: [10, 15, 16]

mylist = ["apple", "banana", "cherry"]
print(random.choices(mylist, cum_weights=[10, 15, 16], k=14))
['apple', 'banana', 'cherry', 'banana', 'apple', 'banana', 'apple', 'apple', 'banana', 'banana', 'apple', 'banana', 'banana', 'banana']

Upvotes: 1

Related Questions