Reputation: 13
I have three arrays of type numpy.ndarray with dimensions (n by 1), named amplitude, distance and weight. I would like to use selected entries of the amplitude array, based on their respective distance- and weight-values. For example I would like to find the indices of the entries within a certain distance range, so I write:
index = np.where( (distance<10) & (distance>=5) )
and I would then proceed by using the values from amplitude(index)
.
This works perfectly well as long as I only use one array for specifying the conditions. When I try for example
index = np.where( (distance<10) & (distance>=5) & (weight>0.8) )
the operation becomes super-slow. Why is that, and is there a better way for this task? In fact, I eventually want to use many conditions from something like 6 different arrays.
Upvotes: 1
Views: 1082
Reputation: 150957
This is just a guess, but perhaps numpy
is broadcasting your arrays? If the arrays are the exact same shape, then numpy
won't broadcast them:
>>> distance = numpy.arange(5) > 2
>>> weight = numpy.arange(5) < 4
>>> distance.shape, weight.shape
((5,), (5,))
>>> distance & weight
array([False, False, False, True, False], dtype=bool)
But if they have different shapes, and the shapes are broadcastable, then it will. (n,)
, (n, 1)
, and (1, n)
are all arguably "n by 1" arrays, they aren't all the same:
>>> distance[None,:].shape, weight[:,None].shape
((1, 5), (5, 1))
>>> distance[None,:]
array([[False, False, False, True, True]], dtype=bool)
>>> weight[:,None]
array([[ True],
[ True],
[ True],
[ True],
[False]], dtype=bool)
>>> distance[None,:] & weight[:,None]
array([[False, False, False, True, True],
[False, False, False, True, True],
[False, False, False, True, True],
[False, False, False, True, True],
[False, False, False, False, False]], dtype=bool)
In addition to returning undesired results, this could be causing a big slowdown if the arrays are even moderately large:
>>> distance = numpy.arange(5000) > 500
>>> weight = numpy.arange(5000) < 4500
>>> %timeit distance & weight
100000 loops, best of 3: 8.17 us per loop
>>> %timeit distance[:,None] & weight[None,:]
10 loops, best of 3: 48.6 ms per loop
Upvotes: 2