Reputation: 6382
I have a large numpy
matrix M
. Some of the rows of the matrix have all of their elements as zero and I need to get the indices of those rows. The naive approach I'm considering is to loop through each row in the matrix and then check each elements.
What would be a better and a faster approach to accomplish this using numpy
?
Upvotes: 34
Views: 48653
Reputation: 1
a = numpy.array([[10,0],[0,0],[0,10]])
isZero = numpy.all(a == 0, axis=1)
deleteFullZero = a[~numpy.all(a== 0, axis=1)]
#isZero >> [False True False]
#deleteFullZero >> [[10 0][0,10]]
Upvotes: 0
Reputation: 2110
Solution using np.sum
,
useful if you want to use a threshold
a = np.array([[1.0, 1.0, 2.99],
[0.0000054, 0.00000078, 0.00000232],
[0, 0, 0],
[1, 1, 0.0],
[0.0, 0.0, 0.0]])
print(np.where(np.sum(np.abs(a), axis=1)==0)[0])
>>[2 4]
print(np.where(np.sum(np.abs(a), axis=1)<0.0001)[0])
>>[1 2 4]
Use np.prod
to check if row contains atleast one zero element
print(np.where(np.prod(a, axis=1)==0)[0])
>>[2 3 4]
Upvotes: 1
Reputation: 19904
The accepted answer works if the elements are int(0)
. If you want to find rows where all the values are 0.0 (floats), you have to use np.isclose()
:
print(x)
# output
tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 1., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
1., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0.],
])
np.where(np.all(np.isclose(labels, 0), axis=1))
(array([ 0, 3]),)
Note: this also works with PyTorch Tensors, which is nice for when you want to find zeroed multihot encoding vectors.
Upvotes: 2
Reputation: 114956
Here's one way. I assume numpy has been imported using import numpy as np
.
In [20]: a
Out[20]:
array([[0, 1, 0],
[1, 0, 1],
[0, 0, 0],
[1, 1, 0],
[0, 0, 0]])
In [21]: np.where(~a.any(axis=1))[0]
Out[21]: array([2, 4])
It's a slight variation of this answer: How to check that a matrix contains a zero column?
Here's what's going on:
The any
method returns True if any value in the array is "truthy". Nonzero numbers are considered True, and 0 is considered False. By using the argument axis=1
, the method is applied to each row. For the example a
, we have:
In [32]: a.any(axis=1)
Out[32]: array([ True, True, False, True, False], dtype=bool)
So each value indicates whether the corresponding row contains a nonzero value. The ~
operator is the binary "not" or complement:
In [33]: ~a.any(axis=1)
Out[33]: array([False, False, True, False, True], dtype=bool)
(An alternative expression that gives the same result is (a == 0).all(axis=1)
.)
To get the row indices, we use the where
function. It returns the indices where its argument is True:
In [34]: np.where(~a.any(axis=1))
Out[34]: (array([2, 4]),)
Note that where
returned a tuple containing a single array. where
works for n-dimensional arrays, so it always returns a tuple. We want the single array in that tuple.
In [35]: np.where(~a.any(axis=1))[0]
Out[35]: array([2, 4])
Upvotes: 66