Reputation: 21

Why is numpy faster at finding non-zero elements in a matrix?

def nonzero(a):      
    row,colum = a.shape
    nonzero_row = np.array([],dtype=int)
    nonzero_col = np.array([],dtype=int)
    for i in range(0,row):
        for j in range(0,colum):
            if a[i,j] != 0:
                nonzero_row = np.append(nonzero_row,i)
                nonzero_col = np.append(nonzero_col,j)
    return (nonzero_row,nonzero_col)

The above code is much slower compared to

(row,col) = np.nonzero(edges_canny)

It would be great if I can get any direction how to increase the speed and why numpy functions are much faster?

Upvotes: 1

Answers (1)

MSeifert

Reputation: 152637

There are 2 reasons why NumPy functions can outperform Pythons types:

The values inside the array are native types, not Python types. This means NumPy doesn't need to go through the abstraction layer that Python has.
NumPy functions are (mostly) written in C. That actually only matters in some cases because a lot of Python functions are also written in C, for example sum.

In your case you also do something really inefficient: You append to an array. That's one really expensive operation in the middle of a double loop. That's an obvious (and unnecessary) bottleneck right there. You would get amazing speedups just by using lists as nonzero_row and nonzero_col and only convert them to array just before you return:

def nonzero_list_based(a):      
    row,colum = a.shape
    a = a.tolist()
    nonzero_row = []
    nonzero_col = []
    for i in range(0,row):
        for j in range(0,colum):
            if a[i][j] != 0:
                nonzero_row.append(i)
                nonzero_col.append(j)
    return (np.array(nonzero_row), np.array(nonzero_col))

The timings:

import numpy as np

def nonzero_original(a):      
    row,colum = a.shape
    nonzero_row = np.array([],dtype=int)
    nonzero_col = np.array([],dtype=int)
    for i in range(0,row):
        for j in range(0,colum):
            if a[i,j] != 0:
                nonzero_row = np.append(nonzero_row,i)
                nonzero_col = np.append(nonzero_col,j)
    return (nonzero_row,nonzero_col) 


arr = np.random.randint(0, 10, (100, 100))
%timeit np.nonzero(arr)
# 315 µs ± 5.39 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit nonzero_original(arr)
# 759 ms ± 12.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit nonzero_list_based(arr)
# 13.1 ms ± 492 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Even though it's 40 times slower than the NumPy operation it's still more than 60 times faster than your approach. There's an important lesson here: Avoid np.append whenever possible!

One additional point why NumPy outperforms alternative approaches is because they (mostly) use state-of-the art approaches (or they "import" them, i.e. BLAS/LAPACK/ATLAS/MKL) to solve the problems. These algorithms have been optimized for correctness and speed over years (if not decades). You shouldn't expect to find a faster or even comparable solution.

Upvotes: 1

Why is numpy faster at finding non-zero elements in a matrix?

Answers (1)

Related Questions