Reputation: 644
Before I start I will say I know this has been asked before, but I've struggled to implement the methods that have been suggested (such as running it via PyPy). This is a last ditch attempt to speed up the code.
Basically I have a piece of code that's around 600 lines long. The bulk of the code takes around 30 seconds to run, but one small section (4 lines long) takes between 5-15 minutes to run. Simple reason for this is it's a mathematical equation in a for-loop, in a for-loop, in a for-loop. So this equation is being calculated in the order of 50 million times. I accept it's going to take a while, but when the same thing is run within MATLAB it's done in under a minute normally. I believe this is because of JIT acceleration; but I might be wrong. Either way this makes me feel there must be a way of speeding this up. The code section is below (the matrices being used are pretty big so I thought I'd just say their dimensions as the numbers within them can vary).
for k in range(7500):
for jj in range(2):
for ii in range(k+1):
Y[k][jj,0] += S[ii][jj] * c[k-ii][jj,jj] * U[ii][jj,jj]
Where the size of the matrices (/arrays) are:
numpy.shape(Y) = (7500, 2, 2)
numpy.shape(S) = (7500, 2, 1)
numpy.shape(c) = (7500, 2, 2)
numpy.shape(U) = (7500, 2, 2)
Does anyone see anything I could do to speed this up?
Edit 1:
As requested here's the MATLAB version of the above:
for k=1:7500
for j=1:2
for i=1:7500
Y(j,1,k)=Y(j,1,k)+S(j,1,i)*c(j,j,k+1-i)*U(j,j,i);
end
end
end
Edit 2:
Should have added, I'm using 3.4.2
Also, sadly I don't have the source maths behind the code. I have it for around 2/3 of the code, but not the latter third. I just have the MATLAB code to convert. (For now at least)
Upvotes: 2
Views: 300
Reputation: 35145
The result can be obtained using np.convolve
.
import numpy as np
S = np.random.rand(1000, 2, 1)
c = np.random.rand(1000, 2, 2)
U = np.random.rand(1000, 2, 2)
Y = np.zeros_like(U)
for k in range(1000):
for jj in range(2):
for ii in range(k+1):
Y[k,jj,0] += S[ii,jj,0] * c[k-ii,jj,jj] * U[ii,jj,jj]
Yx = np.zeros_like(Y)
for jj in range(2):
Yx[:,jj,0] += np.convolve(S[:,jj,0] * U[:,jj,jj], c[:,jj,jj], mode='full')[:Yx.shape[0]]
print(abs(Y - Yx).max())
# -> 3.12638803734e-13
How to find this? Notice that things are just multiplied together along the jj axis, and that the ii summation is actually a convolution. Then it's just a matter of fiddling the indices correctly in the numpy function.
If you want additional speed, subsituting convolve
with scipy.signal.fftconvolve
may speed it up even more. Some timings:
for loops: 77 s
np.convolve: 33.6 ms
fftconvolve: 1.48 ms
This gives a nice ~ 50000x speedup.
Note also that you should always write Y[k,jj,0]
and not Y[k][jj,0]
-- since there is no JIT, the latter creates a temporary array view, which is going to cost you if you evaluate the expression a large number of times. Rewriting the line in your for loop expression as
Y[k,jj,0] += S[ii,jj,0] * c[k-ii,jj,jj] * U[ii,jj,jj]
speeds up the evaluation already by a factor of 4 (!).
Upvotes: 2