Reputation: 4188
In doing some experiments to parallelise 3 encapsulated for cycle with numba, I realised that a naive approach is actually not improving the performance. The following code produce the following times (in seconds):
0.154625177383 # no numba
0.420143127441 # numba first time (lazy initialisation)
0.196285963058 # numba second time
0.200047016144 # nubma third time
0.199403047562 # nubma fourth time
Any idea what am I doing wrong?
import numpy as np
from numba import jit, prange
import time
def run_1():
dims = [100,100,100]
a = np.zeros(dims)
for x in range(100):
for y in range(100):
for z in range(100):
a[x,y,z] = 1
return a
@jit
def run_2():
dims = [100,100,100]
a = np.zeros(dims)
for x in prange(100):
for y in prange(100):
for z in prange(100):
a[x,y,z] = 1
return a
if __name__ == '__main__':
t = time.time()
run_1()
elapsed1 = time.time() - t
print elapsed1
t = time.time()
run_2()
elapsed2 = time.time() - t
print elapsed2
t = time.time()
run_2()
elapsed3 = time.time() - t
print elapsed3
t = time.time()
run_2()
elapsed3 = time.time() - t
print elapsed3
t = time.time()
run_2()
elapsed3 = time.time() - t
print elapsed3
Upvotes: 1
Views: 105
Reputation: 40904
I wonder if there's any code to JIT in these loops: there's no non-trivial Python code to compile, only thin wrappers over C code (yes, range
is C code). Possibly the JIT only adds overhead trying to profile and generate (unsuccessfully) more efficient code.
If you want speed-up, think about parallelization using scipy or maybe direct access to NumPy arrays from Cython.
Upvotes: 1