coucou
coucou

Reputation: 13

Numpy.where: very slow with conditions from two different arrays

I have three arrays of type numpy.ndarray with dimensions (n by 1), named amplitude, distance and weight. I would like to use selected entries of the amplitude array, based on their respective distance- and weight-values. For example I would like to find the indices of the entries within a certain distance range, so I write:

index = np.where( (distance<10) & (distance>=5) )

and I would then proceed by using the values from amplitude(index). This works perfectly well as long as I only use one array for specifying the conditions. When I try for example

index = np.where( (distance<10) & (distance>=5) & (weight>0.8) )

the operation becomes super-slow. Why is that, and is there a better way for this task? In fact, I eventually want to use many conditions from something like 6 different arrays.

Upvotes: 1

Views: 1082

Answers (1)

senderle
senderle

Reputation: 150957

This is just a guess, but perhaps numpy is broadcasting your arrays? If the arrays are the exact same shape, then numpy won't broadcast them:

>>> distance = numpy.arange(5) > 2
>>> weight = numpy.arange(5) < 4
>>> distance.shape, weight.shape
((5,), (5,))
>>> distance & weight
array([False, False, False,  True, False], dtype=bool)

But if they have different shapes, and the shapes are broadcastable, then it will. (n,), (n, 1), and (1, n) are all arguably "n by 1" arrays, they aren't all the same:

>>> distance[None,:].shape, weight[:,None].shape
((1, 5), (5, 1))
>>> distance[None,:]
array([[False, False, False,  True,  True]], dtype=bool)
>>> weight[:,None]
array([[ True],
       [ True],
       [ True],
       [ True],
       [False]], dtype=bool)
>>> distance[None,:] & weight[:,None]
array([[False, False, False,  True,  True],
       [False, False, False,  True,  True],
       [False, False, False,  True,  True],
       [False, False, False,  True,  True],
       [False, False, False, False, False]], dtype=bool)

In addition to returning undesired results, this could be causing a big slowdown if the arrays are even moderately large:

>>> distance = numpy.arange(5000) > 500
>>> weight = numpy.arange(5000) < 4500
>>> %timeit distance & weight
100000 loops, best of 3: 8.17 us per loop
>>> %timeit distance[:,None] & weight[None,:]
10 loops, best of 3: 48.6 ms per loop

Upvotes: 2

Related Questions