Joseph Hastings
Joseph Hastings

Reputation: 561

How can I vectorize this triple-loop over 2d arrays in numpy?

Can I eliminate all Python loops in this computation:

result[i,j,k] = (x[i] * y[j] * z[k]).sum()

where x[i], y[j], z[k] are vectors of length N and x,y,z have first dimensions with length A,B,C s.t. output is shape (A,B,C) and each element is the sum of a triple-product (element-wise).

I can get it down from 3 to 1 loops (code below), but am stuck trying to eliminate the last loop.

If necessary I could make A=B=C (via small amount of padding).

# Example with 3 loops, 2 loops, 1 loop (testing omitted)

N = 100 # more like 100k in real problem
A =   2 # more like 20 in real problem
B =   3 # more like 20 in real problem
C =   4 # more like 20 in real problem

import numpy
x = numpy.random.rand(A, N)
y = numpy.random.rand(B, N)
z = numpy.random.rand(C, N)

# outputs of each variant
result_slow = numpy.empty((A,B,C))
result_vec_C = numpy.empty((A,B,C))
result_vec_CB = numpy.empty((A,B,C))

# 3 nested loops
for i in range(A):
    for j in range(B):
        for k in range(C):
            result_slow[i,j,k] = (x[i] * y[j] * z[k]).sum()

# vectorize loop over C (2 nested loops)
for i in range(A):
    for j in range(B):
        result_vec_C[i,j,:] = (x[i] * y[j] * z).sum(axis=1)

# vectorize one C and B (one loop)
for i in range(A):
    result_vec_CB[i,:,:] = numpy.dot(x[i] * y, z.transpose())

numpy.testing.assert_almost_equal(result_slow, result_vec_C)
numpy.testing.assert_almost_equal(result_slow, result_vec_CB)

Upvotes: 7

Views: 1619

Answers (2)

senderle
senderle

Reputation: 151097

Using einsum makes a lot of sense in your case; but you can do this pretty easily by hand. The trick is to make the arrays broadcastable against one another. That means reshaping them so that each array varies independently along its own axis. Then multiply them together, letting numpy take care of the broadcasting; and then sum along the last (rightmost) axis.

>>> x = numpy.arange(2 * 4).reshape(2, 4)
>>> y = numpy.arange(3 * 4).reshape(3, 4)
>>> z = numpy.arange(4 * 4).reshape(4, 4)
>>> (x.reshape(2, 1, 1, 4) * 
...  y.reshape(1, 3, 1, 4) *
...  z.reshape(1, 1, 4, 4)).sum(axis=3)
array([[[  36,   92,  148,  204],
        [  92,  244,  396,  548],
        [ 148,  396,  644,  892]],

       [[  92,  244,  396,  548],
        [ 244,  748, 1252, 1756],
        [ 396, 1252, 2108, 2964]]])

You can make this a bit more generalized by using slice notation, the newaxis value (which is equal to None, so the below would work with None as well), and the fact that sum accepts negative axis values (with -1 denoting the last, -2 denoting the next-to-last, and so on). This way, you don't have to know the original shape of the arrays; as long as their last axes are compatible, this will broadcast the first three together:

>>> (x[:, numpy.newaxis, numpy.newaxis, :] *
...  y[numpy.newaxis, :, numpy.newaxis, :] *
...  z[numpy.newaxis, numpy.newaxis, :, :]).sum(axis=-1)
array([[[  36,   92,  148,  204],
        [  92,  244,  396,  548],
        [ 148,  396,  644,  892]],

       [[  92,  244,  396,  548],
        [ 244,  748, 1252, 1756],
        [ 396, 1252, 2108, 2964]]])

Upvotes: 7

JoshAdel
JoshAdel

Reputation: 68702

If you are using numpy > 1.6, there is the awesome np.einsum function:

np.einsum('im,jm,km->ijk',x,y,z)

Which is equivalent to your looped versions. I'm not sure how this will fair on efficiency once you get up to the size of your arrays in the real problem (I'm actually getting a segfault on my machine, when I move to those sizes). The other solution which I often prefer for these sorts of problems is re-writing the method using cython.

Upvotes: 9

Related Questions