Max Ghenis
Max Ghenis

Reputation: 15863

Allocate an integer randomly across k bins with uniform distribution across allocations

I'm looking for an efficient Python function that randomly allocates an integer across k bins. That is, some function allocate(n, k) will produce a k-sized array of integers summing to n.

For example, allocate(3, 2) would produce [3, 0], [2, 1], [1, 2], or [0, 3] with equal probability (unlike Allocate an integer randomly across k bins, the allocations, not the items, should be uniformly distributed).

Upvotes: 3

Views: 364

Answers (2)

Michael Szczesny
Michael Szczesny

Reputation: 5026

I couldn't find a fast numpy solution without a massive (n x size x int32) space overhead. So here is a numba implementation of @RichardYannow's solution. Please remember to exclude the first call from benchmarks.

The sampling algorithm is a simplified adaptation of np.rng.choice, size=1 samples should be "np.squeezed" to remove the unused dimension.

import numpy as np
import numba as nb # tested with numba 0.55.1

@nb.njit
def allocate(n, k, size):
    idx = np.zeros((size, k+1), np.int32)
    s = np.empty(n+k, np.uint8)
    for j in range(size):
        s.fill(0)
        for i in range(n+1, n + k):
            x = np.random.randint(i) + 1
            if s[x]: x = i
            s[x] = 1
            idx[j, i - n] = x

        idx[j, 1:k].sort()
        idx[j, k] = n+k
        for i in range(k):
            idx[j, i] = idx[j, i+1] - idx[j, i] - 1
    return idx[:, :k]

allocate(4, 3, size=5)

Output

array([[1, 1, 2],
       [0, 3, 1],
       [1, 1, 2],
       [1, 1, 2],
       [1, 2, 1]], dtype=int32)

Checking uniform distribution across all possible allocations

from collections import Counter
Counter(tuple(x) for x in allocate(4, 3, size=10000))

Output

Counter({(0, 0, 4): 685,
         (0, 1, 3): 680,
         (0, 2, 2): 646,
         (0, 3, 1): 666,
         (0, 4, 0): 662,
         (1, 0, 3): 660,
         (1, 1, 2): 670,
         (1, 2, 1): 661,
         (1, 3, 0): 647,
         (2, 0, 2): 650,
         (2, 1, 1): 699,
         (2, 2, 0): 682,
         (3, 0, 1): 667,
         (3, 1, 0): 669,
         (4, 0, 0): 656})

Upvotes: 0

Richard Yannow
Richard Yannow

Reputation: 96

Using the "stars and bars" approach, we can transform this into a question of picking k-1 positions for possible dividers from a list of n+k-1 possible positions. (Wikipedia proof)

from random import sample

def allocate(n,k):
    dividers = sample(range(1, n+k), k-1)
    dividers = sorted(dividers)
    dividers.insert(0, 0)
    dividers.append(n+k)
    return [dividers[i+1]-dividers[i]-1 for i in range(k)]
    
print(allocate(4,3))

There are ((n+k-1) choose (k-1)) possible allocations that fit your criteria, each corresponding to a specific way of choosing k places to put the dividers among n+k-1 spots for objects, and this is equally likely to result in each one of them.

(Note the subtle difference to the proposed-in-comments existing answer to a similar question: This question is asking for an ordered series of non-negative integers, while the proposed answer gives an ordered series of positive integers. The naive modification of selecting the spots with replacement instead of without replacement does allow the full set of non-negative integer distributions, but it does not leave each distribution equally likely. Consider allocate(4,3): the only way to get [0, 0, 4] is to roll (0, 0), but you can get [1, 2, 1] by rolling (1, 3) or (3, 1)).

Upvotes: 5

Related Questions