Sal
Sal

Reputation: 1693

How to distribute an integer number across bins by probability

I have an integer value k and a list of probabilities which sum to 1. I want to create a new list which breaks up k into smaller integers according to the probabilities in my list.

The issues are that I do not know the size of my list in advance and multiple probabilities in my list could be small and equal, so my new list doesn't always sum to k.

For example:

> k = 10
> l = [0.12, 0.12, 0.04, 0.04, 0.02, 0.02, 0.03, 0.03, 0.02, 0.02, 0.27, 0.27]
> sum(l)
1.0
> new_l = [int(round(k*v)) for v in a]
> print(new_l)
[1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 3, 3]
> sum(new_l)
8

How can I enforce that sum(new_l) == k?

Maybe this isn't even possible, at least with my method. Even if k is larger than len(l), if k is odd it doesn't seem to ever work:

>>> for k in [10, 11, 12, 13, 14, 15, 16, 20, 50, 75, 101, 1001, 1002, 10001, 10002, 100001, 100002]:
...     print(k, sum([round(k*v,0) for v in a]))
... 
(10, 8.0)
(11, 8.0)
(12, 8.0)
(13, 14.0)
(14, 14.0)
(15, 14.0)
(16, 14.0)
(20, 18.0)
(50, 52.0)
(75, 76.0)
(101, 100.0)
(1001, 1000.0)
(1002, 1002.0)
(10001, 10000.0)
(10002, 10002.0)
(100001, 100000.0)
(100002, 100002.0)

Upvotes: 1

Views: 574

Answers (3)

Alassane Ndiaye
Alassane Ndiaye

Reputation: 4777

You are going about this the wrong way. What you are trying to do is select an option from your list n times. This can be achieved using the numpy.random.choice function.

Here is corresponding code:

from numpy.random import choice
from collections import Counter
draw = choice(range(12), 100,
              p=[0.12, 0.12, 0.04, 0.04, 0.02, 0.02, 0.03, 0.03, 0.02, 0.02, 0.27, 0.27])
counter = Counter(draw)
print(counter.values())
print(sum(counter.values()))

This will output:

dict_values([16, 13, 7, 3, 1, 1, 2, 2, 3, 2, 31, 19])
100

Upvotes: 0

Yishai E
Yishai E

Reputation: 521

You can do this by randomization:

sl = np.cumsum(l) # cumulative probability
b = np.random.rand(10) # a random number for each of your integers

new_l = np.zeros(l.shape)    
for i in range(k):
    iinsert = np.where(b[i]>sl)[0][0] # first entry where b is larger than the cumumlative prob
    new_l[iinsert] += 1

Upvotes: -1

olooney
olooney

Reputation: 2483

How about:

k = 10
l = [0.12, 0.12, 0.04, 0.04, 0.02, 0.02, 0.03, 0.03, 0.02, 0.02, 0.27, 0.27]
targets = [k*v for v in l]
new_l = [ int(v) for v in targets]

while sum(new_l) < k:
    residuals = [ t - v for t,v in zip(targets, new_l) ]
    index = residuals.index( max(residuals) )
    new_l[index] += 1

This starts by finding the largest integer n such that n is less than k*v, then fixing it up by incrementing the worst n until the sum(new_l) == k. It will never take more than O(len(l)) operations to complete.

Upvotes: 2

Related Questions