IanRoberts
IanRoberts

Reputation: 2956

Optimizing numpy matrix operations (currently using a for loop)

I've written some code to compute n matrices based on n items in a list, and then multiply all matrices together at the end.

The code is relatively slow and I'd like to learn more about python optimization. I've used the profiling tools and identified the slow-down in my program is this matrix multiplication loop.

I wonder if anyone has any recommendations for how I might speed this up, perhaps taking advantage of built in C-based functions in Python / NumPy?

def my_matrix(x):

    # Initialise overall matrix as an identity matrix
    # | M_11 M_12 |
    # | M_21 M_22 |
    M = np.matrix([[1, 0],[0, 1]])

    for z in z_all:
        param1 = func1(z)
        param2 = func2(x, z)
        param3 = func3(x, z)

        M_11 = param1 + param2
        M_12 = param1 - param2
        M_21 = param1 * param2
        M_22 = param1 / param2

        # Multiply matrix with overall master matrix
        M = M * np.matrix([[M_11, M_12],[M_21, M_22]])
    return M

From a little background reading, it seems that function calls are computationally expensive, and so, computing arrays for my params and then accessing the arrays could be more efficient than evaluating the function within the loop each time... e.g.

 param1s = funcs(z_all)
 param2s = funcs(x, z_all)
 etc

and then in the for loop:

for i, z in enumerate(z_all):
    param1 = params1[i]
    param2 = params2[i]
 etc.

This is quicker, but only by about 10%, as the saving from fewer function calls is offset by time taken in array access using param1 = params1[i] in the loop.

Does anyone have any recommendations please?

Upvotes: 1

Views: 1913

Answers (1)

ali_m
ali_m

Reputation: 74172

You could vectorize your computation of M_11, ... M_22 by doing M_11s = params1 + params2 etc.

That way you'll only have to do the matrix multiplications in the loop:

import numpy as np

...

# compute your 'params' over vectors of z-values
param1s = func1(z_all)
param2s = func2(x, z_all)
param3s = func3(x, z_all)  # you don't seem to be using this for anything...

# compute 'M_11, ... M_22'
M_11 = param1s + param2s
M_12 = param1s - param2s
M_21 = param1s * param2s
M_22 = param1s / param2s

# we construct a (2, 2, nz) array from these
M_all = np.array([[M_11, M_12], [M_21, M_22]])

# roll the 'nz' axis to the front so that its shape is (nz, 2, 2)
M_all = np.rollaxis(M_all, -1, 0)

# initialize output with the identity
M_out = np.eye(2)

# loop over each (2, 2) subarray in 'M_all', update the output with the
# corresponding dot product
for mm in M_all:
    M_out = M_out.dot(mm)

Upvotes: 1

Related Questions