Huayi Wei
Huayi Wei

Reputation: 849

How to apply same operation on every block of block matrices in numpy efficiently?

I have a big 2d array as following shape:

B = [B_0, B_1, B_2, B_n]

where B_0, B_1, ..., B_n have the same number of rows, but the different number of columns and n may be very large. I also have another 1d array idx with shape (n+1,), and

B_i = B[:, idx[i]:idx[i+1]]

and idx[-1] (the last elements of idx) is the total number of columns of B.

I want to do same matrix operation for every B_i, for example:

B_i.T()@B_i

Or with another 2d array:

D = [[D_0], [D_1], ..., [D_n]]

with D_0, D_1, ..., D_n have the same number of columns which is equal to the number of rows of B, but the different number of rows, and

D_i = D[idx[i]:idx[i+1], :]

and I want to compute D_i@B_i.

So my question is how to implement it in python and avoid using the for loop?

The following is a example:

import numpy as np
from timeit import default_timer as timer
# Prepare the test data
n = 1000000 # the number of small matrix 

idx = np.zeros(n+1, dtype=np.int)
idx[1:] = np.random.randint(1, 10, size=n)
idx = np.cumsum(idx)

B = np.random.rand(3, idx[-1])

# Computation
start = timer()
C = []
for i in range(n):
    B_i = B[:, idx[i]:idx[i+1]]
    C_i = B_i.T@B_i
    C.append(C_i)
end = timer()
print('Total time:', end - start)

Upvotes: 2

Views: 301

Answers (2)

Huayi Wei
Huayi Wei

Reputation: 849

One can use map and lambda function to finish this job, please see the following code:

import numpy as np
from timeit import default_timer as timer
# Prepare the test data
n = 1000000 # the number of small matrix 

idx = np.zeros(n+1, dtype=np.int)
idx[1:] = np.random.randint(1, 10, size=n)
idx = np.cumsum(idx)

B = np.random.rand(3, idx[-1])
D = np.random.rand(idx[-1], 3)

BB = np.hsplit(B, idx[1:-1])
DD = np.vsplit(D, idx[1:-1])

CC = list(map(lambda x: x[0]@x[1], zip(DD, BB)))

Upvotes: 0

hpaulj
hpaulj

Reputation: 231510

If I add to your code:

print(B.shape)
print(idx)
print([x.shape for x in C])

Bnn = np.zeros((n, 3, idx[-1]))
for i in range(n):
    s = np.s_[idx[i]:idx[i+1]]
    Bnn[i,:,s] = B[:, s]
Bnn = Bnn.reshape(3*n,-1)
Cnn = Bnn.T @ Bnn
print(Bnn.shape, Cnn.shape)
print(Cnn.sum(), sum([x.sum() for x in C]))

and change n=5, I get

2115:~/mypy$ python3 stack46209231.py 
(3, 31)    # B shape
[ 0  9 17 18 25 31]
[(9, 9), (8, 8), (1, 1), (7, 7), (6, 6)]  # shapes of C elements
(15, 31) (31, 31)     # shapes of diagonalized B and C
197.407879357 197.407879357   # C sums from the 2 routes

So my idea of making a diagonalized version of B, and performing the dot product with that works. For modest sized arrays that should be faster, though the iteration to create Bnn will take time, as will extracting the blocks from Cnn.

But Bnn and Cnn will get very large, and get bogged down with memory swaps, and eventually with memory errors.


With the block_diag function, turning your B into a sparse matrix is quite easy:

from scipy import sparse

Blist = [B[:, idx[i]:idx[i+1]] for i in range(n)]
Bs = sparse.block_diag(Blist, format='bsr')
print(repr(Bs))
Cs = Bs.T@Bs
print(repr(Cs))
print(Cs.sum())

and a sample run

2158:~/mypy$ python3 stack46209231.py 
(3, 20)
[ 0  1  5  9 17 20]
[(1, 1), (4, 4), (4, 4), (8, 8), (3, 3)]
(15, 20) (20, 20)
94.4190125992 94.4190125992
<15x20 sparse matrix of type '<class 'numpy.float64'>'
    with 60 stored elements (blocksize = 1x1) in Block Sparse Row format>
<20x20 sparse matrix of type '<class 'numpy.float64'>'
    with 106 stored elements (blocksize = 1x1) in Block Sparse Row format>

and shapes and checksums match.

For n = 10000, Bnn is too large for my memory. The sparse Bs creation is slow, but the matrix multiplication is fast.

Upvotes: 1

Related Questions