Reputation: 1775
What is an efficient way to sample N numbers, where each number is a sampled by first choosing a random distribution from within a fixed predetermined list (using some specific discrete distribution), and then sampling from that chosen distribution.
So, for instance, if we want to choose 0 with probability 0.30, 1 with probability 0.30, and with probability 0.40 we want to choose any real number in [0,1)
with uniform distribution, we can write:
np.choose(
np.random.choice(2, size=N, p=[0.6, 0.4]),
np.vstack((
np.random.choice(2, size=(1,N)),
np.random.uniform(size=(1,N))
)))
However this generates N
xD
random numbers (where D
is the number of distributions) and uses N
xD
space. Is there a more efficient vectorized (i.e no O(N)
python for-loops) way of achieving this?
If not in the general case, can the above specific combined distribution be somehow efficiently generated otherwise?
Upvotes: 1
Views: 1036
Reputation: 11602
You can use np.unique
to determine exactly how many samples you need from each distribution. This requires a Python loop of size O(D)
. The space complexity is O(N)
, but the time complexity is still O(N * D)
. You can drop it to O(N)
by computing the sparse index for each D
.
N = 10
D = 2
distributions = [
lambda n: np.random.choice(2, size=n),
lambda n: np.random.uniform(n),
]
ds = np.random.choice(D, p=[0.6, 0.4], size=N)
uniques, inverse, counts = np.unique(ds, return_inverse=True, return_counts=True)
result = np.zeros(N)
for d, c in zip(uniques, counts):
result[inverse==d] = distributions[d](c)
Upvotes: 1