Reputation: 1013
I have a matrix, say, P
of size (X,Y)
. Also, I have two matrices, say, Kx
and Ky
of size (M,N)
both, a matrix pk
of size (M,N)
and two vectors u
and v
of X
and Y
respectively. For example, they can be defined as follows:
import numpy as np
P = np.zeros((X,Y));
pk = np.random.rand(M,N);
Kx = np.random.rand(M,N);
Ky = np.random.rand(M,N);
u = np.random.rand(X);
v = np.random.rand(Y);
In the actual code they are not random, of course, but this shall not matter for this example. The question is, if there exists a pure numpy equivalent to the following:
for m in range(0, M):
for n in range(0, N):
for i in range(0,X):
for j in range(0,Y):
Arg = Kx[m,n]*u[i] + Ky[m,n]*v[j];
P[i,j] += pk[m,n]*np.cos(Arg);
All M
,N
,X
,Y
are different, but X
and Y
can be the same if the solution does not exist otherwise.
Upvotes: 3
Views: 1692
Reputation: 879849
A common strategy for eliminating for-loop
s in NumPy calculations is to work with higher-dimensional arrays.
Consider for example, the line
Arg = Kx[m,n]*u[i] + Ky[m,n]*v[j]
This line depends on the indices m
, n
, i
and j
. So Arg
depends on m
, n
, i
and j
. This means Arg
can be thought of as a 4-dimensional array indexed by m
, n
, i
and j
. So we can eliminate the 4 for-loops
-- as far as Arg
is concerned -- by computing
Kxu = Kx[:,:,np.newaxis]*u
Kyv = Ky[:,:,np.newaxis]*v
Arg = Kxu[:,:,:,np.newaxis] + Kyv[:,:,np.newaxis,:]
Kx[:,:,np.newaxis]
has shape (M, N, 1)
, and u
has shape (X,)
. Multiplying them together uses NumPy broadcasting to create an array of shape (M, N, X)
. Thus, above, new axes are used somewhat like placeholders, so that Arg
ends up with 4 axes indexed by m
,n
,i
,j
in that order.
Similarly, P
can be defined as
P = (pk[:,:,np.newaxis,np.newaxis]*np.cos(Arg)).sum(axis=0).sum(axis=0)
The sum(axis=0)
(called twice) sums along the m
and n
axes, so that P
ends up being a 2-dimensional array indexed by i
and j
only.
By working with these 4-dimensional arrays, we get to apply NumPy operations on whole NumPy arrays. In contrast, when using the 4 for-loops
, we had to do computations value-by-value on scalars. Consider for example what np.cos(Arg)
is doing when Arg
is a 4-dimensional array. This off-loads the computation of all the cosines in one NumPy function call which does the underlying loop in compiled C code. This is much much faster than calling np.cos
once for each scalar. This is the reason why working with the higher-dimensional arrays ends up being so much faster than the for-loop
-based code.
import numpy as np
def orig(Kx, Ky, u, v, pk):
M, N = Kx.shape
X = u.size
Y = v.size
P = np.empty((X, Y), dtype=pk.dtype)
for m in range(0, M):
for n in range(0, N):
for i in range(0,X):
for j in range(0,Y):
Arg = Kx[m,n]*u[i] + Ky[m,n]*v[j]
P[i,j] += pk[m,n]*np.cos(Arg)
return P
def alt(Kx, Ky, u, v, pk):
Kxu = Kx[:,:,np.newaxis]*u
Kyv = Ky[:,:,np.newaxis]*v
Arg = Kxu[:,:,:,np.newaxis] + Kyv[:,:,np.newaxis,:]
P = (pk[:,:,np.newaxis,np.newaxis]*np.cos(Arg)).sum(axis=0).sum(axis=0)
return P
M, N = 10, 20
X, Y = 5, 15
Kx = np.random.random((M, N))
Ky = np.random.random((M, N))
u = np.random.random(X)
v = np.random.random(Y)
pk = np.random.random((M, N))
Sanity check, (showing alt and orig return the same result):
In [57]: P2 = alt(Kx, Ky, u, v, pk)
In [58]: P1 = orig(Kx, Ky, u, v, pk)
In [59]: np.allclose(P1, P2)
Out[59]: True
A benchmark, showing alt
is significantly faster than orig
:
In [60]: %timeit orig(Kx, Ky, u, v, pk)
10 loops, best of 3: 33.6 ms per loop
In [61]: %timeit alt(Kx, Ky, u, v, pk)
1000 loops, best of 3: 349 µs per loop
Upvotes: 7