Reputation: 3241
I wan to select only rows which has not any 0 element.
data = np.array([[1,2,3,4,5],
[6,7,0,9,10],
[11,12,13,14,15],
[16,17,18,19,0]])
The result would be:
array([[1,2,3,4,5],
[11,12,13,14,15]])
Upvotes: 3
Views: 7273
Reputation: 221704
You can detect all zeros with data ==0
which will give you a boolean array and then perform np.any
along each row on it. Alternatively, you can detect all non-zeros with data!=0
and then do np.all
to get us row mask of rows without any zero.
One can also use np.einsum
to replace np.any
, which I personally think is crazy, but in a good way, as it gives us a noticeable performance boost as we would confirm later on in this solution.
Thus, you would have three approaches as listed next.
Approach #1:
rows_without_zeros = data[~np.any(data==0, axis=1)]
Approach #2:
rows_without_zeros = data[np.all(data!=0, axis=1)]
Approach #3:
rows_without_zeros = data[~np.einsum('ij->i',data ==0)]
Runtime tests -
This section times the three solutions proposed in this solution and also includes timings for @Ashwini Chaudhary's approach that is also np.all
based approach, but does not use mask or boolean array (not at least in the frontend).
In [129]: data = np.random.randint(-10,10,(10000,10))
In [130]: %timeit data[np.all(data, axis=1)]
1000 loops, best of 3: 1.09 ms per loop
In [131]: %timeit data[np.all(data!=0, axis=1)]
1000 loops, best of 3: 1.03 ms per loop
In [132]: %timeit data[~np.any(data==0,1)]
1000 loops, best of 3: 1 ms per loop
In [133]: %timeit data[~np.einsum('ij->i',data ==0)]
1000 loops, best of 3: 825 µs per loop
Thus, it seems that supplying masks to np.all
or np.any
gives a bit (about 9%
) of performance boost over non-mask based approach. With einsum
, you are looking at around 20%
improvement over np.any
and np.all
based approaches, which is not bad!
Upvotes: 4
Reputation: 251156
Use numpy.all
:
>>> data[np.all(data, axis=1)]
array([[ 1, 2, 3, 4, 5],
[11, 12, 13, 14, 15]])
Upvotes: 10