How can I choose random indicies based on probability?

Question

I have a list of numbers and I'm trying to write a function that will choose n random indices i such i's likelihood is percentages[i]

Function:

def choose_randomly(probabilities, n):
    percentages = accumulated_s(probabilities)
    result = []
    for i in range(n):
        r = random()
        for j in range(n):
            if r < percentages[j]:
                result = result + [j]
    return result

accumulated_s will just generate a corresponding list of probabilities.

I'm expecting results like this:

choose_randomly([1, 2, 3, 4], 2) -> [3 3 0]
choose_randomly([1, 2, 3, 4], 2) -> [1 3 1]

The problem is that this is not returning n indicies. Can anyone point out what I'm doing wrong? Thank you so much!

ShadowRanger · Accepted Answer

Once you've found the right range of probabilities, you're done; break out of the inner loop to generate the next value, or you'll act as if all probabilities above the correct threshold were matched as well:

    # Enumerate all percentages, not just first n
    for j, pct in enumerate(percentages):
        if r < pct:
            result.append(j)  # Don't create tons of temporary lists; mutate in place
            break  # <-- Don't add more results

Also note, if you have a lot of values in the set of probabilities, it may make sense to use functions from the bisect module to find the correct value, rather than scanning linearly each time; for a small number of entries in percentages, linear scanning is fine, but for a large number, O(log n) lookups may beat O(n) scans.

How can I choose random indicies based on probability?

Answers (1)

Related Questions