Numpy multiplication of vectors of different size avoiding for loops

Question

I have a matrix, say, P of size (X,Y). Also, I have two matrices, say, Kx and Ky of size (M,N) both, a matrix pk of size (M,N) and two vectors u and v of X and Y respectively. For example, they can be defined as follows:

import numpy as np
P = np.zeros((X,Y));
pk = np.random.rand(M,N);
Kx = np.random.rand(M,N);
Ky = np.random.rand(M,N);
u = np.random.rand(X);
v = np.random.rand(Y);

In the actual code they are not random, of course, but this shall not matter for this example. The question is, if there exists a pure numpy equivalent to the following:

for m in range(0, M):
    for n in range(0, N):
        for i in range(0,X):
            for j in range(0,Y):
               Arg = Kx[m,n]*u[i] + Ky[m,n]*v[j];
               P[i,j] += pk[m,n]*np.cos(Arg);

All M,N,X,Y are different, but X and Y can be the same if the solution does not exist otherwise.

unutbu · Accepted Answer

A common strategy for eliminating for-loops in NumPy calculations is to work with higher-dimensional arrays.

Consider for example, the line

Arg = Kx[m,n]*u[i] + Ky[m,n]*v[j]

This line depends on the indices m, n, i and j. So Arg depends on m, n, i and j. This means Arg can be thought of as a 4-dimensional array indexed by m, n, i and j. So we can eliminate the 4 for-loops -- as far as Arg is concerned -- by computing

Kxu = Kx[:,:,np.newaxis]*u
Kyv = Ky[:,:,np.newaxis]*v
Arg = Kxu[:,:,:,np.newaxis] + Kyv[:,:,np.newaxis,:]

Kx[:,:,np.newaxis] has shape (M, N, 1), and u has shape (X,). Multiplying them together uses NumPy broadcasting to create an array of shape (M, N, X). Thus, above, new axes are used somewhat like placeholders, so that Arg ends up with 4 axes indexed by m,n,i,j in that order.

Similarly, P can be defined as

P = (pk[:,:,np.newaxis,np.newaxis]*np.cos(Arg)).sum(axis=0).sum(axis=0)

The sum(axis=0) (called twice) sums along the m and n axes, so that P ends up being a 2-dimensional array indexed by i and j only.

By working with these 4-dimensional arrays, we get to apply NumPy operations on whole NumPy arrays. In contrast, when using the 4 for-loops, we had to do computations value-by-value on scalars. Consider for example what np.cos(Arg) is doing when Arg is a 4-dimensional array. This off-loads the computation of all the cosines in one NumPy function call which does the underlying loop in compiled C code. This is much much faster than calling np.cos once for each scalar. This is the reason why working with the higher-dimensional arrays ends up being so much faster than the for-loop-based code.

import numpy as np

def orig(Kx, Ky, u, v, pk):
    M, N = Kx.shape
    X = u.size
    Y = v.size
    P = np.empty((X, Y), dtype=pk.dtype)
    for m in range(0, M):
        for n in range(0, N):
            for i in range(0,X):
                for j in range(0,Y):
                   Arg = Kx[m,n]*u[i] + Ky[m,n]*v[j]
                   P[i,j] += pk[m,n]*np.cos(Arg)
    return P

def alt(Kx, Ky, u, v, pk):
    Kxu = Kx[:,:,np.newaxis]*u
    Kyv = Ky[:,:,np.newaxis]*v
    Arg = Kxu[:,:,:,np.newaxis] + Kyv[:,:,np.newaxis,:]
    P = (pk[:,:,np.newaxis,np.newaxis]*np.cos(Arg)).sum(axis=0).sum(axis=0)
    return P

M, N = 10, 20
X, Y = 5, 15
Kx = np.random.random((M, N))
Ky = np.random.random((M, N))
u = np.random.random(X)
v = np.random.random(Y)
pk = np.random.random((M, N))

Sanity check, (showing alt and orig return the same result):

In [57]: P2 = alt(Kx, Ky, u, v, pk)

In [58]: P1 = orig(Kx, Ky, u, v, pk)

In [59]: np.allclose(P1, P2)
Out[59]: True

A benchmark, showing alt is significantly faster than orig:

In [60]: %timeit orig(Kx, Ky, u, v, pk)
10 loops, best of 3: 33.6 ms per loop

In [61]: %timeit alt(Kx, Ky, u, v, pk)
1000 loops, best of 3: 349 µs per loop

Numpy multiplication of vectors of different size avoiding for loops

Answers (1)

Related Questions