Reputation: 1693
I have an integer value k
and a list of probabilities which sum to 1. I want to create a new list which breaks up k
into smaller integers according to the probabilities in my list.
The issues are that I do not know the size of my list in advance and multiple probabilities in my list could be small and equal, so my new list doesn't always sum to k
.
For example:
> k = 10
> l = [0.12, 0.12, 0.04, 0.04, 0.02, 0.02, 0.03, 0.03, 0.02, 0.02, 0.27, 0.27]
> sum(l)
1.0
> new_l = [int(round(k*v)) for v in a]
> print(new_l)
[1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 3, 3]
> sum(new_l)
8
How can I enforce that sum(new_l) == k
?
Maybe this isn't even possible, at least with my method. Even if k
is larger than len(l)
, if k
is odd it doesn't seem to ever work:
>>> for k in [10, 11, 12, 13, 14, 15, 16, 20, 50, 75, 101, 1001, 1002, 10001, 10002, 100001, 100002]:
... print(k, sum([round(k*v,0) for v in a]))
...
(10, 8.0)
(11, 8.0)
(12, 8.0)
(13, 14.0)
(14, 14.0)
(15, 14.0)
(16, 14.0)
(20, 18.0)
(50, 52.0)
(75, 76.0)
(101, 100.0)
(1001, 1000.0)
(1002, 1002.0)
(10001, 10000.0)
(10002, 10002.0)
(100001, 100000.0)
(100002, 100002.0)
Upvotes: 1
Views: 574
Reputation: 4777
You are going about this the wrong way. What you are trying to do is select an option from your list n
times. This can be achieved using the numpy.random.choice
function.
Here is corresponding code:
from numpy.random import choice
from collections import Counter
draw = choice(range(12), 100,
p=[0.12, 0.12, 0.04, 0.04, 0.02, 0.02, 0.03, 0.03, 0.02, 0.02, 0.27, 0.27])
counter = Counter(draw)
print(counter.values())
print(sum(counter.values()))
This will output:
dict_values([16, 13, 7, 3, 1, 1, 2, 2, 3, 2, 31, 19])
100
Upvotes: 0
Reputation: 521
You can do this by randomization:
sl = np.cumsum(l) # cumulative probability
b = np.random.rand(10) # a random number for each of your integers
new_l = np.zeros(l.shape)
for i in range(k):
iinsert = np.where(b[i]>sl)[0][0] # first entry where b is larger than the cumumlative prob
new_l[iinsert] += 1
Upvotes: -1
Reputation: 2483
How about:
k = 10
l = [0.12, 0.12, 0.04, 0.04, 0.02, 0.02, 0.03, 0.03, 0.02, 0.02, 0.27, 0.27]
targets = [k*v for v in l]
new_l = [ int(v) for v in targets]
while sum(new_l) < k:
residuals = [ t - v for t,v in zip(targets, new_l) ]
index = residuals.index( max(residuals) )
new_l[index] += 1
This starts by finding the largest integer n
such that n
is less than k*v
, then fixing it up by incrementing the worst n until the sum(new_l) == k
. It will never take more than O(len(l)) operations to complete.
Upvotes: 2