Reputation: 15863
I'm looking for an efficient Python function that randomly allocates an integer across k
bins.
That is, some function allocate(n, k)
will produce a k
-sized array of integers summing to n
.
For example, allocate(3, 2)
would produce [3, 0]
, [2, 1]
, [1, 2]
, or [0, 3]
with equal probability (unlike Allocate an integer randomly across k bins, the allocations, not the items, should be uniformly distributed).
Upvotes: 3
Views: 364
Reputation: 5026
I couldn't find a fast numpy
solution without a massive (n x size x int32) space overhead. So here is a numba
implementation of @RichardYannow's solution. Please remember to exclude the first call from benchmarks.
The sampling algorithm is a simplified adaptation of np.rng.choice
, size=1 samples should be "np.squeezed
" to remove the unused dimension.
import numpy as np
import numba as nb # tested with numba 0.55.1
@nb.njit
def allocate(n, k, size):
idx = np.zeros((size, k+1), np.int32)
s = np.empty(n+k, np.uint8)
for j in range(size):
s.fill(0)
for i in range(n+1, n + k):
x = np.random.randint(i) + 1
if s[x]: x = i
s[x] = 1
idx[j, i - n] = x
idx[j, 1:k].sort()
idx[j, k] = n+k
for i in range(k):
idx[j, i] = idx[j, i+1] - idx[j, i] - 1
return idx[:, :k]
allocate(4, 3, size=5)
Output
array([[1, 1, 2],
[0, 3, 1],
[1, 1, 2],
[1, 1, 2],
[1, 2, 1]], dtype=int32)
Checking uniform distribution across all possible allocations
from collections import Counter
Counter(tuple(x) for x in allocate(4, 3, size=10000))
Output
Counter({(0, 0, 4): 685,
(0, 1, 3): 680,
(0, 2, 2): 646,
(0, 3, 1): 666,
(0, 4, 0): 662,
(1, 0, 3): 660,
(1, 1, 2): 670,
(1, 2, 1): 661,
(1, 3, 0): 647,
(2, 0, 2): 650,
(2, 1, 1): 699,
(2, 2, 0): 682,
(3, 0, 1): 667,
(3, 1, 0): 669,
(4, 0, 0): 656})
Upvotes: 0
Reputation: 96
Using the "stars and bars" approach, we can transform this into a question of picking k-1 positions for possible dividers from a list of n+k-1 possible positions. (Wikipedia proof)
from random import sample
def allocate(n,k):
dividers = sample(range(1, n+k), k-1)
dividers = sorted(dividers)
dividers.insert(0, 0)
dividers.append(n+k)
return [dividers[i+1]-dividers[i]-1 for i in range(k)]
print(allocate(4,3))
There are ((n+k-1) choose (k-1)) possible allocations that fit your criteria, each corresponding to a specific way of choosing k places to put the dividers among n+k-1 spots for objects, and this is equally likely to result in each one of them.
(Note the subtle difference to the proposed-in-comments existing answer to a similar question: This question is asking for an ordered series of non-negative integers, while the proposed answer gives an ordered series of positive integers. The naive modification of selecting the spots with replacement instead of without replacement does allow the full set of non-negative integer distributions, but it does not leave each distribution equally likely. Consider allocate(4,3): the only way to get [0, 0, 4] is to roll (0, 0), but you can get [1, 2, 1] by rolling (1, 3) or (3, 1)).
Upvotes: 5