user1815106
user1815106

Reputation: 65

Fast way to select n items (drawn from a Poisson distribution) for each element in array x

I am having some trouble with solving a problem I encountered.

I have an array with prices:

>>> x = np.random.randint(10, size=10)
array([6, 1, 7, 6, 9, 0, 8, 2, 1, 8])

And a (randomly) generated array of Poisson distributed arrivals:

>>> arrivals = np.random.poisson(1, size=10)
array([4, 0, 1, 1, 3, 2, 1, 3, 2, 1])

Each single arrival should be associated with the price at the same index. So in the case above, the first element ( x[0] ) should be selected 4 times ( y[0] ). The second element ( x[1] ) should be selected 0 times ( y[1] )... The result thus should be:

array([6, 6, 6, 6, 7, 6, 9, 9, 9, 0, 0, 8, 2, 2, 2, 1, 1, 8])

Is there any (fast) way to accomplish this, without iterating over the arrays? Any help would be greatly appreciated.

Upvotes: 5

Views: 564

Answers (2)

unutbu
unutbu

Reputation: 879859

You could use np.repeat:

In [43]: x = np.array([6, 1, 7, 6, 9, 0, 8, 2, 1, 8])

In [44]: arrivals = np.array([4, 0, 1, 1, 3, 2, 1, 3, 2, 1])

In [45]: np.repeat(x, arrivals)
Out[45]: array([6, 6, 6, 6, 7, 6, 9, 9, 9, 0, 0, 8, 2, 2, 2, 1, 1, 8])

but note that for certain calculations, it might be possible to avoid having to form this intermediate array. See for example, scipy.stats.binned_statistic.

Upvotes: 6

Magellan88
Magellan88

Reputation: 2573

I don't really see how you could do that without looping at all. What you could do is create the result array prior to looping; that way you don't need to concatenate afterwards.

Result = np.empty( arrivals.sum(), dtype='i' )

and then change the values of that array blockwise:

Result_position = np.r_[ [0], arrivals.cumsum() ]
for i, xx in enumerate(x):
    Result[ Result_position[i]:Result_position[i+1] ] = xx

Upvotes: 0

Related Questions