Struggling With Python For-Loop Speed

Question

Before I start I will say I know this has been asked before, but I've struggled to implement the methods that have been suggested (such as running it via PyPy). This is a last ditch attempt to speed up the code.

Basically I have a piece of code that's around 600 lines long. The bulk of the code takes around 30 seconds to run, but one small section (4 lines long) takes between 5-15 minutes to run. Simple reason for this is it's a mathematical equation in a for-loop, in a for-loop, in a for-loop. So this equation is being calculated in the order of 50 million times. I accept it's going to take a while, but when the same thing is run within MATLAB it's done in under a minute normally. I believe this is because of JIT acceleration; but I might be wrong. Either way this makes me feel there must be a way of speeding this up. The code section is below (the matrices being used are pretty big so I thought I'd just say their dimensions as the numbers within them can vary).

    for k in range(7500):                   
        for jj in range(2):
            for ii in range(k+1):
                 Y[k][jj,0] += S[ii][jj] * c[k-ii][jj,jj] * U[ii][jj,jj]

Where the size of the matrices (/arrays) are:

numpy.shape(Y) = (7500, 2, 2)
numpy.shape(S) = (7500, 2, 1)
numpy.shape(c) = (7500, 2, 2)
numpy.shape(U) = (7500, 2, 2)

Does anyone see anything I could do to speed this up?

Edit 1:

As requested here's the MATLAB version of the above:

for k=1:7500
    for j=1:2
       for i=1:7500

           Y(j,1,k)=Y(j,1,k)+S(j,1,i)*c(j,j,k+1-i)*U(j,j,i);

       end
    end
end

Edit 2:

Should have added, I'm using 3.4.2

Also, sadly I don't have the source maths behind the code. I have it for around 2/3 of the code, but not the latter third. I just have the MATLAB code to convert. (For now at least)

pv. · Accepted Answer

The result can be obtained using np.convolve.

import numpy as np

S = np.random.rand(1000, 2, 1)
c = np.random.rand(1000, 2, 2)
U = np.random.rand(1000, 2, 2)

Y = np.zeros_like(U)
for k in range(1000):
    for jj in range(2):
        for ii in range(k+1):
            Y[k,jj,0] += S[ii,jj,0] * c[k-ii,jj,jj] * U[ii,jj,jj]

Yx = np.zeros_like(Y)
for jj in range(2):
    Yx[:,jj,0] += np.convolve(S[:,jj,0] * U[:,jj,jj], c[:,jj,jj], mode='full')[:Yx.shape[0]]

print(abs(Y - Yx).max())
# -> 3.12638803734e-13

How to find this? Notice that things are just multiplied together along the jj axis, and that the ii summation is actually a convolution. Then it's just a matter of fiddling the indices correctly in the numpy function.

If you want additional speed, subsituting convolve with scipy.signal.fftconvolve may speed it up even more. Some timings:

for loops:         77 s
np.convolve:       33.6 ms
fftconvolve:       1.48 ms

This gives a nice ~ 50000x speedup.

Note also that you should always write Y[k,jj,0] and not Y[k][jj,0] -- since there is no JIT, the latter creates a temporary array view, which is going to cost you if you evaluate the expression a large number of times. Rewriting the line in your for loop expression as

Y[k,jj,0] += S[ii,jj,0] * c[k-ii,jj,jj] * U[ii,jj,jj]

speeds up the evaluation already by a factor of 4 (!).

Struggling With Python For-Loop Speed

Answers (1)

Related Questions