user188606
user188606

Reputation:

How to select an item from a list with known percentages in Python

I wish to select a random word from a list where the is a known chance for each word, for example:

Fruit with Probability

Orange 0.10 Apple 0.05 Mango 0.15 etc

How would be the best way of implementing this? The actual list I will take from is up to 100 items longs and the % do not all tally to 100 % they do fall short to account for the items that had a really low chance of occurrence. I would ideally like to take this from a CSV which is where I store this data. This is not a time critical task.

Thank you for any advice on how best to proceed.

Upvotes: 4

Views: 4444

Answers (7)

Tarik
Tarik

Reputation: 11209

from numpy.random import multinomial
import numpy as np

def pickone(dist):
    return np.where(multinomial(1, dist) == 1)[0][0]

if __name__ == '__main__':
    lst = [ ('Orange', 0.10), ('Apple', 0.05), ('Mango', 0.15), ('etc', 0.70) ]
    dist = [p[1] for p in lst]
    
    N = 10000
    draws = np.array([pickone(dist) for i in range(N)], dtype=int)
    hist = np.histogram(draws, bins=[i for i in range(len(dist)+1)])[0]
    for i in range(len(lst)):
        print(f'{lst[i]} {hist[i]/N}')

Upvotes: 0

job
job

Reputation: 9273

What you want is to draw from a multinomial distribution. Assuming you have two lists of items and probabilities, and the probabilities sum to 1 (if not, just add some default value to cover the extra):

def choose(items,chances):
    import random
    p = chances[0]
    x = random.random()
    i = 0
    while x > p :
        i = i + 1
        p = p + chances[i]
    return items[i]

Upvotes: 2

steveha
steveha

Reputation: 76725

lst = [ ('Orange', 0.10), ('Apple', 0.05), ('Mango', 0.15), ('etc', 0.69) ]

x = 0.0
lst2 = []
for fruit, chance in lst:
    low = x
    high = x + chance
    tup = (low, high, fruit)
    lst2.append(tup)
    x += chance

if x > 1.0:
    raise ValueError, "chances add up to more than 100%"

low = x
high = 1.0
tup = (low, high, None)
lst2.append(tup)

import random

def pick_one(lst2):
    if lst2[0][2] is None:
        raise ValueError, "no valid values to choose"
    while True:
        r = random.random()
        for low, high, fruit in lst2:
            if low <= r < high:
                if fruit is None:
                    break  # try again with a different random value
                else:
                    return fruit

pick_one(lst2)


# test it 10,000 times
d = {}
for i in xrange(10000):
    x = pick_one(lst2)
    if x in d:
        d[x] += 1
    else:
        d[x] = 1

I think this is a little clearer. Instead of a tricky way of representing ranges as ascending values, we just keep ranges. Because we are testing ranges, we can simply walk forward through the lst2 values; no need to use reversed().

Upvotes: 0

Ants Aasma
Ants Aasma

Reputation: 54882

You can pick items with weighted probabilities if you assign each item a number range proportional to its probability, pick a random number between zero and the sum of the ranges and find what item matches it. The following class does exactly that:

from random import random

class WeightedChoice(object):
    def __init__(self, weights):
        """Pick items with weighted probabilities.

            weights
                a sequence of tuples of item and it's weight.
        """
        self._total_weight = 0.
        self._item_levels = []
        for item, weight in weights:
            self._total_weight += weight
            self._item_levels.append((self._total_weight, item))

    def pick(self):
        pick = self._total_weight * random()
        for level, item in self._item_levels:
            if level >= pick:
                return item

You can then load the CSV file with the csv module and feed it to the WeightedChoice class:

import csv

weighed_items = [(item,float(weight)) for item,weight in csv.reader(open('file.csv'))]
picker = WeightedChoice(weighed_items)
print(picker.pick())

Upvotes: 2

steveha
steveha

Reputation: 76725

lst = [ ('Orange', 0.10), ('Apple', 0.05), ('Mango', 0.15), ('etc', 0.69) ]

x = 0.0
lst2 = []
for fruit, chance in lst:
    tup = (x, fruit)
    lst2.append(tup)
    x += chance

tup = (x, None)
lst2.append(tup)

import random

def pick_one(lst2):
    if lst2[0][1] is None:
        raise ValueError, "no valid values to choose"
    while True:
        r = random.random()
        for x, fruit in reversed(lst2):
            if x <= r:
                if fruit is None:
                    break  # try again with a different random value
                else:
                    return fruit

pick_one(lst2)

This builds a new list, with ascending values representing the range of values that choose a fruit; then pick_one() walks backward down the list, looking for a value that is <= the current random value. We put a "sentinel" value on the end of the list; if the values don't reach 1.0, there is a chance of a random value that shouldn't match anything, and it will match the sentinel value and then be rejected. random.random() returns a random value in the range [0.0, 1.0) so it is certain to match something in the list eventually.

The nice thing here is that you should be able to have one value with a 0.000001 chance of matching, and it should actually match with that frequency; the other solutions, where you make a list with the items repeated and just use random.choice() to choose one, would require a list with a million items in it to handle this case.

Upvotes: 1

Mark
Mark

Reputation: 108557

import random
d= {'orange': 0.10, 'mango': 0.15, 'apple': 0.05}
weightedArray = []
for k in d:
  weightedArray+=[k]*int(d[k]*100)
random.choice(weightedArray)

EDITS

This is essentially what Brian said above.

Upvotes: -1

Brian
Brian

Reputation: 25834

One solution is to normalize the probabilities to integers and then repeat each element once per value (e.g. a list with 2 Oranges, 1 Apple, 3 Mangos). This is incredibly easy to do (from random import choice). If that is not practical, try the code here.

Upvotes: -1

Related Questions