Reputation: 19776
N, M = 1000, 4000000
a = np.random.uniform(0, 1, (N, M))
k = np.random.randint(0, N, (N, M))
out = np.zeros((N, M))
for i in range(N):
for j in range(M):
out[k[i, j], j] += a[i, j]
I work with very long for-loops; %%timeit
on above with pass
replacing the operation yields
1min 19s ± 663 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
this is unacceptable in context (C++ took 6.5 sec). There's no reason for above to be done with Python objects; arrays have well-defined types. Implementing this in C/C++ as an extension is an overkill on both developer and user ends; I'm just passing arrays to loop and do arithmetic on.
Is there a way to tell Numpy "move this logic to C", or another library that can handle nested loops involving only arrays? I seek it for the general case, not workarounds for this specific example (but if you have one I can open a separate Q&A).
Upvotes: 7
Views: 1102
Reputation: 2260
This is basically the idea behind Numba. Not as fast as C, but it can get close... It uses a jit compiler to compile python code to machine and it's compatible with most Numpy functions. (In the docs you find all the details)
import numpy as np
from numba import njit
@njit
def f(N, M):
a = np.random.uniform(0, 1, (N, M))
k = np.random.randint(0, N, (N, M))
out = np.zeros((N, M))
for i in range(N):
for j in range(M):
out[k[i, j], j] += a[i, j]
return out
def f_python(N, M):
a = np.random.uniform(0, 1, (N, M))
k = np.random.randint(0, N, (N, M))
out = np.zeros((N, M))
for i in range(N):
for j in range(M):
out[k[i, j], j] += a[i, j]
return out
Pure Python:
%%timeit
N, M = 100, 4000
f_python(M, N)
338 ms ± 12.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
With Numba:
%%timeit
N, M = 100, 4000
f(M, N)
12 ms ± 534 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Upvotes: 5