Roman
Roman

Reputation: 3241

Numpy select non-zero rows

I wan to select only rows which has not any 0 element.

data = np.array([[1,2,3,4,5],
                [6,7,0,9,10],
                [11,12,13,14,15],
                [16,17,18,19,0]])

The result would be:

array([[1,2,3,4,5],
       [11,12,13,14,15]])

Upvotes: 3

Views: 7273

Answers (2)

Divakar
Divakar

Reputation: 221704

You can detect all zeros with data ==0 which will give you a boolean array and then perform np.any along each row on it. Alternatively, you can detect all non-zeros with data!=0 and then do np.all to get us row mask of rows without any zero.

One can also use np.einsum to replace np.any, which I personally think is crazy, but in a good way, as it gives us a noticeable performance boost as we would confirm later on in this solution.

Thus, you would have three approaches as listed next.

Approach #1:

rows_without_zeros = data[~np.any(data==0, axis=1)]

Approach #2:

rows_without_zeros = data[np.all(data!=0, axis=1)]

Approach #3:

rows_without_zeros = data[~np.einsum('ij->i',data ==0)]

Runtime tests -

This section times the three solutions proposed in this solution and also includes timings for @Ashwini Chaudhary's approach that is also np.all based approach, but does not use mask or boolean array (not at least in the frontend).

In [129]: data = np.random.randint(-10,10,(10000,10))

In [130]: %timeit data[np.all(data, axis=1)]
1000 loops, best of 3: 1.09 ms per loop

In [131]: %timeit data[np.all(data!=0, axis=1)]
1000 loops, best of 3: 1.03 ms per loop

In [132]: %timeit data[~np.any(data==0,1)]
1000 loops, best of 3: 1 ms per loop

In [133]: %timeit data[~np.einsum('ij->i',data ==0)]
1000 loops, best of 3: 825 µs per loop

Thus, it seems that supplying masks to np.all or np.any gives a bit (about 9%) of performance boost over non-mask based approach. With einsum, you are looking at around 20% improvement over np.any and np.all based approaches, which is not bad!

Upvotes: 4

Ashwini Chaudhary
Ashwini Chaudhary

Reputation: 251156

Use numpy.all:

>>> data[np.all(data, axis=1)]
array([[ 1,  2,  3,  4,  5],
       [11, 12, 13, 14, 15]])

Upvotes: 10

Related Questions