Varun Nambigari
Varun Nambigari

Reputation: 21

Why is numpy faster at finding non-zero elements in a matrix?

def nonzero(a):      
    row,colum = a.shape
    nonzero_row = np.array([],dtype=int)
    nonzero_col = np.array([],dtype=int)
    for i in range(0,row):
        for j in range(0,colum):
            if a[i,j] != 0:
                nonzero_row = np.append(nonzero_row,i)
                nonzero_col = np.append(nonzero_col,j)
    return (nonzero_row,nonzero_col) 

The above code is much slower compared to

(row,col) = np.nonzero(edges_canny)

It would be great if I can get any direction how to increase the speed and why numpy functions are much faster?

Upvotes: 1

Views: 983

Answers (1)

MSeifert
MSeifert

Reputation: 152637

There are 2 reasons why NumPy functions can outperform Pythons types:

  • The values inside the array are native types, not Python types. This means NumPy doesn't need to go through the abstraction layer that Python has.
  • NumPy functions are (mostly) written in C. That actually only matters in some cases because a lot of Python functions are also written in C, for example sum.

In your case you also do something really inefficient: You append to an array. That's one really expensive operation in the middle of a double loop. That's an obvious (and unnecessary) bottleneck right there. You would get amazing speedups just by using lists as nonzero_row and nonzero_col and only convert them to array just before you return:

def nonzero_list_based(a):      
    row,colum = a.shape
    a = a.tolist()
    nonzero_row = []
    nonzero_col = []
    for i in range(0,row):
        for j in range(0,colum):
            if a[i][j] != 0:
                nonzero_row.append(i)
                nonzero_col.append(j)
    return (np.array(nonzero_row), np.array(nonzero_col)) 

The timings:

import numpy as np

def nonzero_original(a):      
    row,colum = a.shape
    nonzero_row = np.array([],dtype=int)
    nonzero_col = np.array([],dtype=int)
    for i in range(0,row):
        for j in range(0,colum):
            if a[i,j] != 0:
                nonzero_row = np.append(nonzero_row,i)
                nonzero_col = np.append(nonzero_col,j)
    return (nonzero_row,nonzero_col) 


arr = np.random.randint(0, 10, (100, 100))
%timeit np.nonzero(arr)
# 315 µs ± 5.39 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit nonzero_original(arr)
# 759 ms ± 12.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit nonzero_list_based(arr)
# 13.1 ms ± 492 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Even though it's 40 times slower than the NumPy operation it's still more than 60 times faster than your approach. There's an important lesson here: Avoid np.append whenever possible!


One additional point why NumPy outperforms alternative approaches is because they (mostly) use state-of-the art approaches (or they "import" them, i.e. BLAS/LAPACK/ATLAS/MKL) to solve the problems. These algorithms have been optimized for correctness and speed over years (if not decades). You shouldn't expect to find a faster or even comparable solution.

Upvotes: 1

Related Questions