Reputation: 225
I'm filling a 3D-array with a function that depends on the values of other 1D-arrays as illustrated in the code below. The code involving my real data takes forever because the length of my 1d-arrays (and hence my 3D-array) is of order 1 million. Is there any way to do this much faster, for example without using loops in python ?
An idea that might seem stupid but I'm still wondering if it wouldn't be faster to fill in this object importing code in C++ in my program... I'm new to C++ so I did not try it out.
import numpy as np
import time
start_time = time.time()
kx = np.linspace(0,400,100)
ky = np.linspace(0,400,100)
kz = np.linspace(0,400,100)
Kh = np.empty((len(kx),len(ky),len(kz)))
for i in range(len(kx)):
for j in range(len(ky)):
for k in range(len(kz)):
if np.sqrt(kx[i]**2+ky[j]**2) != 0:
Kh[i][j][k] = np.sqrt(kx[i]**2+ky[j]**2+kz[k]**2)
else:
Kh[i][j][k] = 1
print('Finished in %s seconds' % (time.time() - start_time))
Upvotes: 2
Views: 1415
Reputation: 801
A basic rule for using numpy is: use matrix operation instead of for loops whenever possible.
import numpy as np
import time
kx = np.linspace(0,400,100)
ky = np.linspace(0,400,100)
kz = np.linspace(0,400,100)
Kh = np.empty((len(kx),len(ky),len(kz)))
def func_matrix_operation(kx, ky, kz, _):
kx_ = np.expand_dims(kx ** 2, 1) # shape: (100, 1)
ky_ = np.expand_dims(ky ** 2, 0) # shape: (1, 100)
# Make use of broadcasting such that kxy[i, j] = kx[i] ** 2 + ky[j] ** 2
kxy = kx_ + ky_ # shape: (100, 100)
kxy_ = np.expand_dims(kxy, 2) # shape: (100, 100, 1)
kz_ = np.reshape(kz ** 2, (1, 1, len(kz))) # shape: (1, 1, 100)
kxyz = kxy_ + kz_ # kxyz[i, j, k] = kx[i] ** 2 + ky[j] ** 2 + kz[k] ** 2
kh = np.sqrt(kxyz)
kh[kxy == 0] = 1
return kh
start_time = time.time()
Kh1 = func_matrix_operation(kx, ky, kz, Kh)
print('Matrix operation Finished in %s seconds' % (time.time() - start_time))
def func_normal(kx, ky, kz, Kh):
for i in range(len(kx)):
for j in range(len(ky)):
for k in range(len(kz)):
if np.sqrt(kx[i] ** 2 + ky[j] ** 2) != 0:
Kh[i][j][k] = np.sqrt(kx[i] ** 2 + ky[j] ** 2 + kz[k] ** 2)
else:
Kh[i][j][k] = 1
return Kh
start_time = time.time()
Kh2 = func_normal(kx, ky, kz, Kh)
print('Normal function Finished in %s seconds' % (time.time() - start_time))
assert np.array_equal(Kh1, Kh2)
The output is:
Matrix operation Finished in 0.018651008606 seconds
Normal function Finished in 5.78078794479 seconds
Upvotes: 0
Reputation: 39042
You can use the @njit
decorator from numba
, a high peformance JIT compiler. It reduces the time by more than one order of magnitude. Below is the comparison and the code. It is as simple as importing njit
and then just using @njit
as the decorator to your function. This is the official website.
I also computed time for 1000*1000*1000
data points using njit
and it took just 17.856173038482666 seconds. Using parallel version as @njit(parallel=True)
further reduces the time to 9.36257791519165 seconds. Doing the same with normal function would take several minutes.
I also did some time comparison of njit
and the matrix operation as suggested by @Bily in his answer below. While the times are comparable for number of points up to 700, the njit
method clearly wins for larger number of points > 700 as you can see in the figure below.
import numpy as np
import time
from numba import njit
kx = np.linspace(0,400,100)
ky = np.linspace(0,400,100)
kz = np.linspace(0,400,100)
Kh = np.empty((len(kx),len(ky),len(kz)))
@njit # <----- Decorating your function here
def func_njit(kx, ky, kz, Kh):
for i in range(len(kx)):
for j in range(len(ky)):
for k in range(len(kz)):
if np.sqrt(kx[i]**2+ky[j]**2) != 0:
Kh[i][j][k] = np.sqrt(kx[i]**2+ky[j]**2+kz[k]**2)
else:
Kh[i][j][k] = 1
return Kh
start_time = time.time()
Kh = func_njit(kx, ky, kz, Kh)
print('NJIT Finished in %s seconds' % (time.time() - start_time))
def func_normal(kx, ky, kz, Kh):
for i in range(len(kx)):
for j in range(len(ky)):
for k in range(len(kz)):
if np.sqrt(kx[i]**2+ky[j]**2) != 0:
Kh[i][j][k] = np.sqrt(kx[i]**2+ky[j]**2+kz[k]**2)
else:
Kh[i][j][k] = 1
return Kh
start_time = time.time()
Kh = func_normal(kx, ky, kz, Kh)
print('Normal function Finished in %s seconds' % (time.time() - start_time))
NJIT Finished in 0.36797094345092773 seconds
Normal function Finished in 5.540749788284302 seconds
Comparison of njit
and the matrix method
Upvotes: 1