Numpy random sampling from multiple distributions

Question

What is an efficient way to sample N numbers, where each number is a sampled by first choosing a random distribution from within a fixed predetermined list (using some specific discrete distribution), and then sampling from that chosen distribution.

So, for instance, if we want to choose 0 with probability 0.30, 1 with probability 0.30, and with probability 0.40 we want to choose any real number in [0,1) with uniform distribution, we can write:

np.choose(
    np.random.choice(2, size=N, p=[0.6, 0.4]),
    np.vstack((
        np.random.choice(2, size=(1,N)),
        np.random.uniform(size=(1,N))
    )))

However this generates NxD random numbers (where D is the number of distributions) and uses NxD space. Is there a more efficient vectorized (i.e no O(N) python for-loops) way of achieving this?

If not in the general case, can the above specific combined distribution be somehow efficiently generated otherwise?

hilberts_drinking_problem · Accepted Answer

You can use np.unique to determine exactly how many samples you need from each distribution. This requires a Python loop of size O(D). The space complexity is O(N), but the time complexity is still O(N * D). You can drop it to O(N) by computing the sparse index for each D.

N = 10
D = 2

distributions = [
  lambda n: np.random.choice(2, size=n),
  lambda n: np.random.uniform(n),
]

ds = np.random.choice(D, p=[0.6, 0.4], size=N)
uniques, inverse, counts = np.unique(ds, return_inverse=True, return_counts=True)
result = np.zeros(N)
for d, c in zip(uniques, counts):
  result[inverse==d] = distributions[d](c)

Numpy random sampling from multiple distributions

Answers (1)

Related Questions