Reputation: 3
I need to calculate a double sum of the form:
wignersum{ell} = sum_{ell1} sum_{ell2} (2*ell1+1)(2*ell2+1) * W{ell,ell1,ell2}^2 * C1(ell1) * C2(ell2)
where wignersum is an array indexed by ell, and ell, ell1, and ell2 all run from 0 to ellmax. The W{ell,ell1,ell2}^2 are a set of known coefficients that I've already calculated (called w3j), stored in an array of shape (ellmax, ellmax, ellmax) as a global variable to be called by this function. (These coefficients are time intensive to calculate and I've found it faster to load them from a numpy file). The C1 and C2 are arrays of coefficients of shape (ellmax).
I have successfully calculated this sum by making use of a double for loop and grabbing the appropriate elements from each prexisting array and updating the wignersum array in each iteration. I assume there is a better way to vectorize this problem to speed up the calculation. I thought about making the C1 and C2 arrays into arrays of the same shape as the w3j array, then multiplying these arrays elementwise before using np.sum on the ell1 and ell2 axes. I'm unsure whether this is in fact a good method of vecotrizing, and if it is, how to actually do this.
The code as it stands is something like
import numpy as np
ell_max = 400
w3j = np.ones((ell_max, ell_max, ell_max))
C1 = np.arange(ell_max)
C2 = np.arange(ell_max)
def function(ell_max)
ells = np.arange(ell_max)
wignersum = np.zeros(ell_max)
factor = np.array([2*i+1 for i in range(384)])
for ell1 in ells:
A = factor[ell1]
B = C1[ell1]
for ell2 in ells:
D = factor[ell2] * C2[ell2] * w3j[:,ell1,ell2]
wignersum += A * B * D
return wignersum
(note the in actuality C1
and C2
are not global variables but are local variables that must be calculated from a set of parameters fed to function
. This is not the limiting factor in the code speed however)
With the double for loop this takes ~1.5 seconds to run for ell_max~400 which is too long for the purposes I'm using it for. I'd like to vectorize this as much as possible to improve speed.
Upvotes: 0
Views: 95
Reputation: 53109
You can use either einsum or matrix multiplication for a ~20x speedup:
import numpy as np
ell_max = 400
w3j = np.random.randint(1,10,(ell_max, ell_max, ell_max))
C1 = np.random.randint(1,10,ell_max)
C2 = np.random.randint(1,10,ell_max)
def function(ell_max):
ells = np.arange(ell_max)
wignersum = np.zeros(ell_max)
factor = np.array([2*i+1 for i in range(ell_max)])
for ell1 in ells:
A = factor[ell1]
B = C1[ell1]
for ell2 in ells:
D = factor[ell2] * C2[ell2] * w3j[:,ell1,ell2]
wignersum += A * B * D
return wignersum
def pp_es(l_mx):
l = np.arange(l_mx)
f = 2*l+1
return np.einsum("i,i,j,j,kij",f,C1,f,C2,w3j,optimize=True)
def pp_mm(l_mx):
l = np.arange(l_mx)
f = 2*l+1
return w3j.reshape(l_mx,-1)@np.outer(f*C1,f*C2).ravel()
from timeit import timeit
print(timeit(lambda:pp_es(400),number=10))
print(timeit(lambda:pp_mm(400),number=10))
print(timeit(lambda:function(400),number=10))
print((pp_mm(400)==pp_es(400)).all())
print((function(400)==pp_mm(400)).all())
Sample run:
0.6061844169162214 # einsum
0.6111843499820679 # matrix x vector
12.233918005018495 # OP
True # einsum == matrix x vector
True # OP == matrix x vector
Upvotes: 1