Reputation: 42459
Some weeks ago I posted a question (Speed up nested for loop with elements exponentiation) which got a very good answer by abarnert. This question is related to that one since it makes use of the performance improvements suggested by said user.
I need to improve the performance of a function that involves calculating three factors and then applying an exponential on them.
Here's a MWE
of my code:
import numpy as np
import timeit
def random_data(N):
# Generate some random data.
return np.random.uniform(0., 10., N)
# Data lists.
array1 = np.array([random_data(4) for _ in range(1000)])
array2 = np.array([random_data(3) for _ in range(2000)])
# Function.
def func():
# Empty list that holds all values obtained in for loop.
lst = []
for elem in array1:
# Avoid numeric errors if one of these values is 0.
e_1, e_2 = max(elem[0], 1e-10), max(elem[1], 1e-10)
# Obtain three parameters.
A = 1./(e_1*e_2)
B = -0.5*((elem[2]-array2[:,0])/e_1)**2
C = -0.5*((elem[3]-array2[:,1])/e_2)**2
# Apply exponential.
value = A*np.exp(B+C)
# Store value in list.
lst.append(value)
return lst
# time function.
func_time = timeit.timeit(func, number=100)
print func_time
Is it possible to speed up func
without having to recurr to parallelization?
Upvotes: 1
Views: 143
Reputation: 229581
Here's what I have so far. My approach is to do as much of the math as possible across numpy arrays.
Optimizations:
A
s within numpyB
and C
by splitting them into factors, some of which can be computed within numpyCode:
def optfunc():
e0 = array1[:, 0]
e1 = array1[:, 1]
e2 = array1[:, 2]
e3 = array1[:, 3]
ar0 = array2[:, 0]
ar1 = array2[:, 1]
As = 1./(e0 * e1)
Bfactors = -0.5 * (1 / e0**2)
Cfactors = -0.5 * (1 / e1**2)
lst = []
for i, elem in enumerate(array1):
B = ((elem[2] - ar0) ** 2) * Bfactors[i]
C = ((elem[3] - ar1) ** 2) * Cfactors[i]
value = As[i]*np.exp(B+C)
lst.append(value)
return lst
print np.allclose(optfunc(), func())
# time function.
func_time = timeit.timeit(func, number=10)
opt_func_time = timeit.timeit(optfunc, number=10)
print "%.3fs --> %.3fs" % (func_time, opt_func_time)
Result:
True
0.759s --> 0.485s
At this point I'm stuck. I managed to do it entirely without python for loops, but it is slower than the above version for a reason I do not yet understand:
def optfunc():
x = array1
y = array2
x0 = x[:, 0]
x1 = x[:, 1]
x2 = x[:, 2]
x3 = x[:, 3]
y0 = y[:, 0]
y1 = y[:, 1]
A = 1./(x0 * x1)
Bfactors = -0.5 * (1 / x0**2)
Cfactors = -0.5 * (1 / x1**2)
B = (np.transpose([x2]) - y0)**2 * np.transpose([Bfactors])
C = (np.transpose([x3]) - y1)**2 * np.transpose([Cfactors])
return np.transpose([A]) * np.exp(B + C)
Result:
True
0.780s --> 0.558s
However note that the latter gets you an np.array
whereas the former only gets you a Python list... this might account for the difference but I'm not sure.
Upvotes: 4