Reputation: 782
I am trying to port code from Matlab to SciPy. Here is the simplified version of the code I have written so far: https://gist.github.com/atmo/01b6e007be9ef90e402c . However, Python version is considerably slower then Matlab. I've included profiling results in the gist and they show that almost 90% of time python spends evaluating function f
. Is there any way to speed up its evaluation, except from rewriting it in C or Cython?
Upvotes: 1
Views: 2924
Reputation: 13459
As I mentioned in the comments, you can get rid of about half the calls to quad
(and consequently the complicated function f
) if you take into account that the matrix is symmetric.
Further speed gains, still in pure python, are to be had by rewriting that complicated function. I did most of that in sympy.
Finally I tried to vectorize the call to quad
using np.vectorize
.
from scipy.integrate import quad
from scipy.special import jn as besselj
from scipy import exp, zeros, linspace
from scipy.linalg import norm
import numpy as np
def complicated_func(lmbd, a, n, k):
u,v,w = 5, 3, 2
x = a*lmbd
fac = exp(2*x)
comm = (2*w + x)
part1 = ((v**2 + 4*w*(w + 2*x) + 2*x*(x - 1))*fac**5
+ 2*u*fac**4
+ (-v**2 - 4*(w*(3*w + 4*x + 1) + x*(x-2)) + 1)*fac**3
+ (-8*(w + x) + 2)*fac**2
+ (2*comm*(comm + 1) - 1)*fac)
return part1/lmbd *besselj(n+1, lmbd) * besselj(k+1, lmbd)
def perform_quad(n, k, a):
return quad(complicated_func, 0, np.inf, args=(a,n,k))[0]
def improved_main():
sz = 20
amatrix = np.zeros((sz,sz))
ls = -np.linspace(1, 10, 20)/2
inds = np.tril_indices(sz)
myv3 = np.vectorize(perform_quad)
res = myv3(inds[0], inds[1], ls.reshape(-1,1))
results = np.empty(res.shape[0])
for rowind, row in enumerate(res):
amatrix[inds] = row
symm_matrix = amatrix + amatrix.T - np.diag(amatrix.diagonal())
results[rowind] = norm(symm_matrix)
return results
Timing results show me a speed increase of a factor 5 (you'll forgive me if I only ran it once, it takes long enough as it is):
In [11]: %timeit -n1 -r1 improved_main()
1 loops, best of 1: 6.92 s per loop
In [12]: %timeit -n1 -r1 main()
1 loops, best of 1: 35.9 s per loop
There was also a microgain to be had if you replaced v
immediately by its square, because that's the only time it is used in that complicated function: as its square.
There's also an extreme amount of repetition in the calls to besselj
, but I don't see how to avoid that, because quad
will determine lmbd
, so you can't easily precompute those values and then perform a lookup.
If you profile the improved_main
, you'll see that the amount of calls to complicated_func
has nearly decreased by a factor of 2 (the diagonal still needs to be computed). All the other speed gains can be attributed to np.vectorize
and the improvements to complicated_func
.
I don't have Matlab on my system, so I can't make any statements for its speed gain if you improve the complicated function there.
Upvotes: 1
Reputation: 231665
Your numpy
version probably is comparable in to speed to older MATLAB runs. But new MATLAB versions do various forms of just-in-time compilation that speed up repeated calculations considerably.
My guess is that you can nibble away at the lambda
and f
code, and maybe cut their evaluation times in half. But the real killer is that you are calling f
so many times.
For a start I'd try to precalculate things in f
. For example define K1=K[1]
and use K1
in the calculations. That will reduce the number of indexing calls. Are of the exponentials repeated? Maybe replace the lambda
definition with a regular def
, or combine it with f
.
Upvotes: 1