Faster for-loops with arrays in Python

Question

N, M = 1000, 4000000
a = np.random.uniform(0, 1, (N, M))
k = np.random.randint(0, N, (N, M))

out = np.zeros((N, M))
for i in range(N):
    for j in range(M):
        out[k[i, j], j] += a[i, j]

I work with very long for-loops; %%timeit on above with pass replacing the operation yields

1min 19s ± 663 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

this is unacceptable in context (C++ took 6.5 sec). There's no reason for above to be done with Python objects; arrays have well-defined types. Implementing this in C/C++ as an extension is an overkill on both developer and user ends; I'm just passing arrays to loop and do arithmetic on.

Is there a way to tell Numpy "move this logic to C", or another library that can handle nested loops involving only arrays? I seek it for the general case, not workarounds for this specific example (but if you have one I can open a separate Q&A).

dzang · Accepted Answer

This is basically the idea behind Numba. Not as fast as C, but it can get close... It uses a jit compiler to compile python code to machine and it's compatible with most Numpy functions. (In the docs you find all the details)

import numpy as np
from numba import njit


@njit
def f(N, M):
    a = np.random.uniform(0, 1, (N, M))
    k = np.random.randint(0, N, (N, M))

    out = np.zeros((N, M))
    for i in range(N):
        for j in range(M):
            out[k[i, j], j] += a[i, j]
    return out


def f_python(N, M):
    a = np.random.uniform(0, 1, (N, M))
    k = np.random.randint(0, N, (N, M))

    out = np.zeros((N, M))
    for i in range(N):
        for j in range(M):
            out[k[i, j], j] += a[i, j]
    return out

Pure Python:

%%timeit

N, M = 100, 4000
f_python(M, N)

338 ms ± 12.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

With Numba:

%%timeit

N, M = 100, 4000
f(M, N)

12 ms ± 534 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Faster for-loops with arrays in Python

Answers (1)

Related Questions