mrgloom
mrgloom

Reputation: 21602

Find first non-zero row in numpy

Suppose we have array like a, and we want to find first non zero row in it. a can be large, i.e. single channel image.

a = np.array([[0, 0, 0], [0, 0, 0], [0, 1, 0], [2, 3, 2]])

array([[0, 0, 0],
       [0, 0, 0],
       [0, 1, 0],
       [2, 3, 2]])

What is the fasterst and elegant way to do this in numpy?

For now I'm doing it like:

row_idx = np.argmin(np.sum(a, axis=1)==0)

Upvotes: 4

Views: 2506

Answers (3)

Paul Panzer
Paul Panzer

Reputation: 53029

Here is a method (pp below) that is quite fast but only works for contiguous arrays. It uses view casting to bool and takes advantage of short circuiting. In the comparison below I've taken the liberty to fix the other answers, so they can correctly handle all-zero inputs.

Results:

                                pp    galaxyan  WeNYoBen1  WeNYoBen2
contiguous small sparse   1.863220    1.465050   3.522510   4.861850
           large dense    2.086379  865.158230  68.337360  42.832701
                 medium   2.136710  726.706850  71.640330  43.047541
                 sparse  11.146050  694.993751  71.333189  42.406949
non cont.  small sparse   1.683651    1.516769   3.193740   4.017490
           large dense   55.097940  433.429850  64.628370  72.984670
                 medium  60.434350  397.200490  67.545200  51.276210
                 sparse  61.433990  387.847329  67.141630  45.788040

Code:

import numpy as np

def first_nz_row(a):
    if a.flags.c_contiguous:
        b = a.ravel().view(bool)
        res = b.argmax()
        return res // (a.shape[1]*a.itemsize) if res or b[res] else a.shape[0]
    else:
        b = a.astype(bool).ravel()
        res = b.argmax()
        return res // a.shape[1] if res or b[res] else a.shape[0]

def use_nz(a):
    b = np.nonzero(a)[0]
    return b[0] if b.size else a.shape[0]

def any_max(a):
    b = a.any(1)
    res = b.argmax()
    return res if res or b[res] else a.shape[0]

def max_max(a):
    b = a.max(1).astype(bool)
    res = b.argmax()
    return res if res or b[res] else a.shape[0]

from timeit import timeit


A = [np.random.uniform(-R, 1, (N,M)).clip(0,None)
     for R,N,M in [[100,2,2], [10,100,1000], [1000,100,1000], [10000,100,1000]]]
t = 10000*np.array(
    [[timeit(f, number=100) for f in (lambda: first_nz_row(a),
                                      lambda: use_nz(a),
                                      lambda: any_max(a),
                                      lambda: max_max(a))]
     for a in A] +
    [[timeit(f, number=100) for f in (lambda: first_nz_row(a),
                                      lambda: use_nz(a),
                                      lambda: any_max(a),
                                      lambda: max_max(a))]
     for a in [a[:,::2] for a in A]])

import pandas as pd
index = "dense medium sparse".split()
index = pd.MultiIndex([['contiguous', 'non cont.'], ['small', 'large'], index], [np.repeat((0,1),4), np.repeat((0,1,0,1,),(1,3,1,3)), np.r_[2, :3, 2, :3]])
t = pd.DataFrame(t, columns="pp galaxyan WeNYoBen1 WeNYoBen2".split(), index=index)
print(t)

Upvotes: 4

BENY
BENY

Reputation: 323226

What I will do

a.any(1).argmax()
2

Or

a.max(1).astype(bool).argmax()
2

Upvotes: 0

galaxyan
galaxyan

Reputation: 6111

nonzero will find all items are not zero and return row/col number

np.nonzero(a)[0][0]

2

Upvotes: 1

Related Questions