Reputation: 107
I would like to speed up the following code related to spherical modes. It is a simplification of my actual code (I didn't want to oversimplify it because it can lead to solutions that are not valid for my actual problem):
import numpy as np
import time
import math
def function_call(npp,nmax):
matrix_a = np.random.rand(npp)
matrix_b = np.random.rand(npp)
a=np.random.rand()
F = np.zeros((2*npp, 2*nmax*(nmax+2)),dtype=np.complex_)
npa=np.arange(npp)
for n in range(1,nmax+1,1):
a_n = np.sqrt(1 / (2 * np.pi * n * (n + 1)))
for m in range(-n,n+1,1):
b_m = (-1)**((np.abs(m) + m) / 2)
p_mn = int(1 / 2 * (np.abs(m) + n + 1 / 2 * (1 - (-1)**(np.abs(m) + n))))
alpha_mn = np.sqrt(((2 * n + 1) / 2) * math.factorial(n - np.abs(m)) / math.factorial(n + np.abs(m)))
A_mn = np.zeros(npp)
B_mn = np.zeros(npp)
for p in range(p_mn,n+1,1):
Cai_pmn = math.factorial(n) * ((-1)**(n + p)) / (math.factorial(p) * math.factorial(n - p)) * math.factorial(2 * p)/math.factorial(2 * p - np.abs(m) - n)
A_mn = A_mn + Cai_pmn * (np.cos(matrix_a))**(2 * p - np.abs(m) - n)
B_mn = B_mn + (2 * p - np.abs(m) - n) * Cai_pmn * (np.cos(matrix_a))**(np.abs(2 * p - np.abs(m) - n - 1))
A_mn = A_mn / (2**n * math.factorial(n))
B_mn = B_mn / (2**n * math.factorial(n))
S_mn = alpha_mn * m * A_mn * np.sin(matrix_a)**np.abs(np.abs(m) - 1)
D_mn = alpha_mn * (np.abs(m) * A_mn * np.cos(matrix_a) * (np.sin(matrix_a))**(np.abs(np.abs(m) - 1)) - B_mn * (np.sin(matrix_a))**(np.abs(m) + 1))
h1 = 1j**(n+1)*np.exp(-1j*a)/(a)
h2 = 1j**(n)*np.exp(-1j*a)/(a)
F_s1_theta = 1j * a_n * b_m * h1 * (S_mn * np.exp(1j * m * matrix_b))
F_s1_phi = -a_n * b_m * h1 * (D_mn * np.exp(1j * m * matrix_b))
F_s2_theta = a_n * b_m * h2 * (D_mn * np.exp(1j * m * matrix_b))
F_s2_phi = 1j * a_n * b_m * h2 * (S_mn * np.exp(1j * m * matrix_b))
j = 2 * (n * (n + 1) + m - 1)
F[2 * npa, j] = F_s1_theta[npa]
F[2 * npa+1 , j] = F_s1_phi[npa]
j = 2 * (n * (n + 1) + m - 1) + 1
F[2 * npa, j] = F_s2_theta[npa]
F[2 * npa+1, j] = F_s2_phi[npa]
prev_time_ep =time.time()
npp=500
nmax=80
function_call(npp,nmax)
print(" --- %s seconds ---" % (time.time() - prev_time_ep))
The first option that I have tried is to vectorize it (it took me some time because it is not obvious). However, memory consumption grows rapidly and it is not efficient.
I have also tried with Numba and I actually succeed to reduce by 4 the previous time, but I was looking for larger improvements if it is possible.
I have read also that maybe multiprocessing or Cython could be good options. Maybe there is a way of vectorizing it without growing rapidly in memory usage?
Upvotes: 1
Views: 85
Reputation: 1463
first of all your code has O(n^3) complexity that is not a big deal.
A lot of this code can be done using array programming that will speed up a lot expecially using numpy.
I suggest to use a profiler to find not performing code and start to rewrite the code in vectrial out of loops.
I wrote a tool perf_tools useful for this kind of works. It can guide you in a sort of performance driven solution.
Upvotes: 0
Reputation: 1463
I've worked a little bit on your code, here the benchmark. the bottleneck is in the factorial computation.
================== PerfTool ==================
task |aver(s) |sum(s) |count |std
main loop | 0.134| 10.712| 80| 0.101
+-second loop | 0.134| 10.712| 80| 0.101
+-A | 0.000| 0.245| 6560| 0.000
+-B | 0.001| 5.648| 6560| 0.001
+-C | 0.000| 0.541| 6560| 0.000
+-D | 0.000| 1.505| 6560| 0.000
+-E | 0.000| 1.769| 6560| 0.000
+-F | 0.000| 0.867| 6560| 0.000
mx creation | 0.000| 0.000| 1| 0.000
preparation | 0.000| 0.000| 1| 0.000
overall | 0.03| 10.71| 39522|-
The B and C sentinel are:
with PerfTool('B'):
for p in range(p_mn,n+1,1):
Cai_pmn = math.factorial(n) * ((-1)**(n + p)) / (math.factorial(p) * math.factorial(n - p)) * math.factorial(2 * p)/math.factorial(2 * p - np.abs(m) - n)
A_mn = A_mn + Cai_pmn * (np.cos(matrix_a))**(2 * p - np.abs(m) - n)
B_mn = B_mn + (2 * p - np.abs(m) - n) * Cai_pmn * (np.cos(matrix_a))**(np.abs(2 * p - np.abs(m) - n - 1))
with PerfTool('C'):
A_mn = A_mn / (2**n * math.factorial(n))
B_mn = B_mn / (2**n * math.factorial(n))
As you can see most of the time is spent on B, so i've add a kind of cache, here:
rng = np.arange(1,nmax+1,1)
cache = dict(zip(rng,factorial(rng)))
def get_factorial(w,cache=cache):
if w not in cache:
cache[w] = math.factorial(w)
return cache[w]
To use instead of math.factorial, this avoid to recalculate same values.
Finally, B was refactored as B_vec, and here there is the root of the evil! i've marked that code as B_vec_slow, that 2 lines takes up most of the time.
with PerfTool('B_vec'):
prng = np.arange(p_mn, n+1)
Cai_pmn_vec = get_factorial(n) * ((-1)**(n + prng)) / (factorial(prng) * factorial(n - prng)) * factorial(2 * prng)/factorial(2 * prng - np.abs(m) - n)
with PerfTool('B_vec_slow'):
A_mn_vec = Cai_pmn_vec*np.power(cos_matrix_a[:,np.newaxis],2 * prng - np.abs(m) - n)
B_mn_vec = (2 * prng - np.abs(m) - n) * Cai_pmn_vec * np.power(cos_matrix_a[:,np.newaxis], np.abs(2 * prng - np.abs(m) - n - 1))
A_mn = np.sum(A_mn_vec,axis=1)
B_mn = np.sum(B_mn_vec,axis=1)
This is the result:
================== PerfTool ==================
task |aver(s) |sum(s) |count |std
main loop | 0.072| 5.736| 80| 0.052
+-second loop | 0.072| 5.735| 80| 0.052
+-A | 0.000| 0.194| 6560| 0.000
+-B_vec | 0.001| 3.490| 6560| 0.000
+-B_vec_slow | 0.000| 2.987| 6560| 0.000
+-C | 0.000| 0.126| 6560| 0.000
+-D | 0.000| 0.536| 6560| 0.000
+-E | 0.000| 0.768| 6560| 0.000
+-F | 0.000| 0.522| 6560| 0.000
preparation | 0.000| 0.000| 1| 0.000
mx creation | 0.000| 0.000| 1| 0.000
overall | 0.01| 5.74| 46082|-
If you can work on these 2 lines you can expect to run in 2/3 sec.
HERE: the optimized code: https://www.codepile.net/pile/8oDyGp6Q
Upvotes: 1
Reputation: 134
I'm not sure about which loops are faster than others in Python, but as I can see in your code, there are some duplicated function calls (same arguments).
For instance:
A_mn = A_mn / (2**n * math.factorial(n))
B_mn = B_mn / (2**n * math.factorial(n))
Both declarations share same denominator. Try to calculate first the factorial, save it into a variable and use it as denominator.
Also, np.exp() function is called multiple times, all with the same arguments:
np.exp(-1j*a)/(a)
np.exp(1j * m * matrix_b)
Numpy exponential function can be a little slow in execution time.
The same thing happens with all np.sin / np.cos(matrix_a) calls.
These changes won't provide you a huge improvement, but it's something:
def function_call_2(npp,nmax):
matrix_a = np.random.rand(npp)
matrix_b = np.random.rand(npp)
a=np.random.rand()
F = np.zeros((2*npp, 2*nmax*(nmax+2)),dtype=np.complex_)
npa=np.arange(npp)
# Auxiliary variables
sin_matrix_a = np.sin(matrix_a)
cos_matrix_a = np.cos(matrix_a)
for n in range(1,nmax+1,1):
# Auxilary variables
denominator = (2**n * math.factorial(n))
a_n = np.sqrt(1 / (2 * np.pi * n * (n + 1)))
for m in range(-n,n+1,1):
b_m = (-1)**((np.abs(m) + m) / 2)
p_mn = int(1 / 2 * (np.abs(m) + n + 1 / 2 * (1 - (-1)**(np.abs(m) + n))))
alpha_mn = np.sqrt(((2 * n + 1) / 2) * math.factorial(n - np.abs(m)) / math.factorial(n + np.abs(m)))
A_mn = np.zeros(npp)
B_mn = np.zeros(npp)
for p in range(p_mn,n+1,1):
Cai_pmn = math.factorial(n) * ((-1)**(n + p)) / (math.factorial(p) * math.factorial(n - p)) * math.factorial(2 * p)/math.factorial(2 * p - np.abs(m) - n)
A_mn = A_mn + Cai_pmn * (cos_matrix_a)**(2 * p - np.abs(m) - n)
B_mn = B_mn + (2 * p - np.abs(m) - n) * Cai_pmn * (cos_matrix_a)**(np.abs(2 * p - np.abs(m) - n - 1))
A_mn = A_mn / denominator
B_mn = B_mn / denominator
S_mn = alpha_mn * m * A_mn * sin_matrix_a**np.abs(np.abs(m) - 1)
D_mn = alpha_mn * (np.abs(m) * A_mn * cos_matrix_a * (sin_matrix_a)**(np.abs(np.abs(m) - 1)) - B_mn * (sin_matrix_a)**(np.abs(m) + 1))
# Auxilary variables
np_exp_1 = np.exp(-1j*a)/(a)
np_exp_2 = np.exp(1j * m * matrix_b)
h1 = 1j**(n+1)*np_exp_1
h2 = 1j**(n)*np_exp_1
F_s1_theta = 1j * a_n * b_m * h1 * (S_mn * np_exp_2)
F_s1_phi = -a_n * b_m * h1 * (D_mn * np_exp_2)
F_s2_theta = a_n * b_m * h2 * (D_mn * np_exp_2)
F_s2_phi = 1j * a_n * b_m * h2 * (S_mn * np_exp_2)
j = 2 * (n * (n + 1) + m - 1)
F[2 * npa, j] = F_s1_theta[npa]
F[2 * npa+1 , j] = F_s1_phi[npa]
j = 2 * (n * (n + 1) + m - 1) + 1
F[2 * npa, j] = F_s2_theta[npa]
F[2 * npa+1, j] = F_s2_phi[npa]
I made a simple test, and I get a ~3 seconds faster execution:
prev_time_ep =time.time()
npp=500
nmax=80
function_call(npp,nmax)
print("--- %s seconds ---" % (time.time() - prev_time_ep))
prev_time_ep =time.time()
function_call_2(npp,nmax)
print("--- %s seconds ---" % (time.time() - prev_time_ep))
Output:
--- 16.99950885772705 seconds ---
--- 14.231853580474854 seconds ---
Upvotes: 0