Huayi Wei
Huayi Wei

Reputation: 849

How to multiply many matrix pairs with different shape quickly in numpy?

I have many matrix pairs to multiply, and the following is an example code

D = np.random.rand(100, 10)
B = np.random.rand(10, 100)
split = array([ 3,  6, 11, 14, 18, 25, 31, 38, 45, 52, 60, 67, 84, 88, 90, 95])
DD = np.vsplit(D, split)
BB = np.hsplit(B, split)
G = [ m0@m1 for m0, m1 in zip(DD, BB)]

The following is the test on my computer:

In [42]: %timeit [m0@m1 for m0, m1 in zip(DD, BB)]
The slowest run took 10.01 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 24.5 µs per loop

Along the axis 0, D is split into many small arrays which have different number rows. And along the axis 1, B is split into many small arrays which have different number columns. Here I use the list comprehension to do my work.

Is the list comprehension the faster way to do this work? Or does there exist any faster way to do such thing in numpy? Can one do it directly on D and B?

Upvotes: 2

Views: 333

Answers (1)

jakevdp
jakevdp

Reputation: 86310

If the splits are unequal sizes, the only way you could map that onto a single broadcasted operation would be to do a sparse matrix multiplication on appropriately-constructed block diagonal matrices. For example, you might approach it this way:

from scipy.sparse import block_diag

def split_dot_sparse(D, B, split):
    # create block-diagonal matrices
    DD = block_diag(np.vsplit(D, split))
    BB = block_diag(np.hsplit(B, split))

    # multiply the blocks
    DDBB = DD @ BB

    # extract the results
    return [DDBB[i:j, i:j].toarray() for i, j in zip([0, *split], [*split, D.shape[0] + 1])]

This produces equivalent results to the list comprehension:

def split_dot_list_comp(D, B, split):
    DD = np.vsplit(D, split)
    BB = np.hsplit(B, split)
    return [m0@m1 for m0, m1 in zip(DD, BB)]

G1 = split_dot_list_comp(D, B, split)
G2 = split_dot_sparse(D, B, split)

all(np.allclose(*mats) for mats in zip(G1, G2)
# True

Unfortunately, the sparse approach is much slower than the simple list comprehension approach:

%timeit split_dot_list_comp(D, B, split)
# 73.5 µs ± 1.34 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit split_dot_sparse(D, B, split)
# 4.67 ms ± 48 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

We might be able to optimize the creation of the block matrices or the extraction of the results (you could even create a sparse matrix from a no-copy view of the data if you were very careful), but even then the sparse matrix product by itself is a factor of a few slower than the list comprehension. For uneven splits, you're not going to be able to do better than your baseline approach.

If you had even splits, the story would be a bit different, because then you could use numpy broadcasting to quickly compute the result. It might look something like this:

def split_dot_broadcasted(D, B, splitsize):
    DD = D.reshape(-1, splitsize, D.shape[1])
    BB = B.reshape(B.shape[0], -1, splitsize)
    return DD @ BB.transpose(1, 0, 2)

This gives the same result as the list comprehension approach:

splitsize = 5
split = splitsize * np.arange(1, D.shape[0] // splitsize)
G1 = split_dot_list_comp(D, B, split)
G2 = split_dot_broadcasted(D, B, splitsize)
np.allclose(G1, G2)
# True

And the broadcasting approach is several times faster:

%timeit split_dot_list_comp(D, B, split)
# 83.6 µs ± 314 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit split_dot_broadcasted(D, B, splitsize)
# 29.3 µs ± 539 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

So, long story short: with uneven splits, you'll probably not be able to beat the list comprehension you proposed in the question. For even-sized splits, using numpy's broadcasting will be a factor of a few faster.

Upvotes: 1

Related Questions