Reputation: 849
I have many matrix pairs to multiply, and the following is an example code
D = np.random.rand(100, 10)
B = np.random.rand(10, 100)
split = array([ 3, 6, 11, 14, 18, 25, 31, 38, 45, 52, 60, 67, 84, 88, 90, 95])
DD = np.vsplit(D, split)
BB = np.hsplit(B, split)
G = [ m0@m1 for m0, m1 in zip(DD, BB)]
The following is the test on my computer:
In [42]: %timeit [m0@m1 for m0, m1 in zip(DD, BB)]
The slowest run took 10.01 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 24.5 µs per loop
Along the axis 0, D
is split into many small arrays which have different number rows. And along the axis 1, B
is split into many small arrays which have different number columns. Here I use the list comprehension to do my work.
Is the list comprehension the faster way to do this work? Or does there exist any faster way to do such thing in numpy? Can one do it directly on D
and B
?
Upvotes: 2
Views: 333
Reputation: 86310
If the splits are unequal sizes, the only way you could map that onto a single broadcasted operation would be to do a sparse matrix multiplication on appropriately-constructed block diagonal matrices. For example, you might approach it this way:
from scipy.sparse import block_diag
def split_dot_sparse(D, B, split):
# create block-diagonal matrices
DD = block_diag(np.vsplit(D, split))
BB = block_diag(np.hsplit(B, split))
# multiply the blocks
DDBB = DD @ BB
# extract the results
return [DDBB[i:j, i:j].toarray() for i, j in zip([0, *split], [*split, D.shape[0] + 1])]
This produces equivalent results to the list comprehension:
def split_dot_list_comp(D, B, split):
DD = np.vsplit(D, split)
BB = np.hsplit(B, split)
return [m0@m1 for m0, m1 in zip(DD, BB)]
G1 = split_dot_list_comp(D, B, split)
G2 = split_dot_sparse(D, B, split)
all(np.allclose(*mats) for mats in zip(G1, G2)
# True
Unfortunately, the sparse approach is much slower than the simple list comprehension approach:
%timeit split_dot_list_comp(D, B, split)
# 73.5 µs ± 1.34 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit split_dot_sparse(D, B, split)
# 4.67 ms ± 48 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
We might be able to optimize the creation of the block matrices or the extraction of the results (you could even create a sparse matrix from a no-copy view of the data if you were very careful), but even then the sparse matrix product by itself is a factor of a few slower than the list comprehension. For uneven splits, you're not going to be able to do better than your baseline approach.
If you had even splits, the story would be a bit different, because then you could use numpy broadcasting to quickly compute the result. It might look something like this:
def split_dot_broadcasted(D, B, splitsize):
DD = D.reshape(-1, splitsize, D.shape[1])
BB = B.reshape(B.shape[0], -1, splitsize)
return DD @ BB.transpose(1, 0, 2)
This gives the same result as the list comprehension approach:
splitsize = 5
split = splitsize * np.arange(1, D.shape[0] // splitsize)
G1 = split_dot_list_comp(D, B, split)
G2 = split_dot_broadcasted(D, B, splitsize)
np.allclose(G1, G2)
# True
And the broadcasting approach is several times faster:
%timeit split_dot_list_comp(D, B, split)
# 83.6 µs ± 314 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit split_dot_broadcasted(D, B, splitsize)
# 29.3 µs ± 539 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
So, long story short: with uneven splits, you'll probably not be able to beat the list comprehension you proposed in the question. For even-sized splits, using numpy's broadcasting will be a factor of a few faster.
Upvotes: 1