Reputation: 21602
Suppose we have array like a
, and we want to find first non zero row in it. a
can be large, i.e. single channel image.
a = np.array([[0, 0, 0], [0, 0, 0], [0, 1, 0], [2, 3, 2]])
array([[0, 0, 0],
[0, 0, 0],
[0, 1, 0],
[2, 3, 2]])
What is the fasterst and elegant way to do this in numpy?
For now I'm doing it like:
row_idx = np.argmin(np.sum(a, axis=1)==0)
Upvotes: 4
Views: 2506
Reputation: 53029
Here is a method (pp below) that is quite fast but only works for contiguous arrays. It uses view casting to bool and takes advantage of short circuiting. In the comparison below I've taken the liberty to fix the other answers, so they can correctly handle all-zero inputs.
Results:
pp galaxyan WeNYoBen1 WeNYoBen2
contiguous small sparse 1.863220 1.465050 3.522510 4.861850
large dense 2.086379 865.158230 68.337360 42.832701
medium 2.136710 726.706850 71.640330 43.047541
sparse 11.146050 694.993751 71.333189 42.406949
non cont. small sparse 1.683651 1.516769 3.193740 4.017490
large dense 55.097940 433.429850 64.628370 72.984670
medium 60.434350 397.200490 67.545200 51.276210
sparse 61.433990 387.847329 67.141630 45.788040
Code:
import numpy as np
def first_nz_row(a):
if a.flags.c_contiguous:
b = a.ravel().view(bool)
res = b.argmax()
return res // (a.shape[1]*a.itemsize) if res or b[res] else a.shape[0]
else:
b = a.astype(bool).ravel()
res = b.argmax()
return res // a.shape[1] if res or b[res] else a.shape[0]
def use_nz(a):
b = np.nonzero(a)[0]
return b[0] if b.size else a.shape[0]
def any_max(a):
b = a.any(1)
res = b.argmax()
return res if res or b[res] else a.shape[0]
def max_max(a):
b = a.max(1).astype(bool)
res = b.argmax()
return res if res or b[res] else a.shape[0]
from timeit import timeit
A = [np.random.uniform(-R, 1, (N,M)).clip(0,None)
for R,N,M in [[100,2,2], [10,100,1000], [1000,100,1000], [10000,100,1000]]]
t = 10000*np.array(
[[timeit(f, number=100) for f in (lambda: first_nz_row(a),
lambda: use_nz(a),
lambda: any_max(a),
lambda: max_max(a))]
for a in A] +
[[timeit(f, number=100) for f in (lambda: first_nz_row(a),
lambda: use_nz(a),
lambda: any_max(a),
lambda: max_max(a))]
for a in [a[:,::2] for a in A]])
import pandas as pd
index = "dense medium sparse".split()
index = pd.MultiIndex([['contiguous', 'non cont.'], ['small', 'large'], index], [np.repeat((0,1),4), np.repeat((0,1,0,1,),(1,3,1,3)), np.r_[2, :3, 2, :3]])
t = pd.DataFrame(t, columns="pp galaxyan WeNYoBen1 WeNYoBen2".split(), index=index)
print(t)
Upvotes: 4
Reputation: 323226
What I will do
a.any(1).argmax()
2
Or
a.max(1).astype(bool).argmax()
2
Upvotes: 0
Reputation: 6111
nonzero will find all items are not zero and return row/col number
np.nonzero(a)[0][0]
2
Upvotes: 1