Reputation: 849
I have a big 2d array as following shape:
B = [B_0, B_1, B_2, B_n]
where B_0, B_1, ..., B_n
have the same number of rows, but the different number of columns and n
may be very large. I also have another 1d array idx
with shape (n+1,)
, and
B_i = B[:, idx[i]:idx[i+1]]
and idx[-1]
(the last elements of idx
) is the total number of columns of B
.
I want to do same matrix operation for every B_i
, for example:
B_i.T()@B_i
Or with another 2d array:
D = [[D_0], [D_1], ..., [D_n]]
with D_0, D_1, ..., D_n
have the same number of columns which is equal to the number of rows of B
, but the different number of rows, and
D_i = D[idx[i]:idx[i+1], :]
and I want to compute D_i@B_i
.
So my question is how to implement it in python and avoid using the for
loop?
The following is a example:
import numpy as np
from timeit import default_timer as timer
# Prepare the test data
n = 1000000 # the number of small matrix
idx = np.zeros(n+1, dtype=np.int)
idx[1:] = np.random.randint(1, 10, size=n)
idx = np.cumsum(idx)
B = np.random.rand(3, idx[-1])
# Computation
start = timer()
C = []
for i in range(n):
B_i = B[:, idx[i]:idx[i+1]]
C_i = B_i.T@B_i
C.append(C_i)
end = timer()
print('Total time:', end - start)
Upvotes: 2
Views: 301
Reputation: 849
One can use map
and lambda
function to finish this job, please see the following code:
import numpy as np
from timeit import default_timer as timer
# Prepare the test data
n = 1000000 # the number of small matrix
idx = np.zeros(n+1, dtype=np.int)
idx[1:] = np.random.randint(1, 10, size=n)
idx = np.cumsum(idx)
B = np.random.rand(3, idx[-1])
D = np.random.rand(idx[-1], 3)
BB = np.hsplit(B, idx[1:-1])
DD = np.vsplit(D, idx[1:-1])
CC = list(map(lambda x: x[0]@x[1], zip(DD, BB)))
Upvotes: 0
Reputation: 231510
If I add to your code:
print(B.shape)
print(idx)
print([x.shape for x in C])
Bnn = np.zeros((n, 3, idx[-1]))
for i in range(n):
s = np.s_[idx[i]:idx[i+1]]
Bnn[i,:,s] = B[:, s]
Bnn = Bnn.reshape(3*n,-1)
Cnn = Bnn.T @ Bnn
print(Bnn.shape, Cnn.shape)
print(Cnn.sum(), sum([x.sum() for x in C]))
and change n=5
, I get
2115:~/mypy$ python3 stack46209231.py
(3, 31) # B shape
[ 0 9 17 18 25 31]
[(9, 9), (8, 8), (1, 1), (7, 7), (6, 6)] # shapes of C elements
(15, 31) (31, 31) # shapes of diagonalized B and C
197.407879357 197.407879357 # C sums from the 2 routes
So my idea of making a diagonalized version of B
, and performing the dot product with that works. For modest sized arrays that should be faster, though the iteration to create Bnn
will take time, as will extracting the blocks from Cnn
.
But Bnn
and Cnn
will get very large, and get bogged down with memory swaps, and eventually with memory errors.
With the block_diag
function, turning your B
into a sparse matrix is quite easy:
from scipy import sparse
Blist = [B[:, idx[i]:idx[i+1]] for i in range(n)]
Bs = sparse.block_diag(Blist, format='bsr')
print(repr(Bs))
Cs = Bs.T@Bs
print(repr(Cs))
print(Cs.sum())
and a sample run
2158:~/mypy$ python3 stack46209231.py
(3, 20)
[ 0 1 5 9 17 20]
[(1, 1), (4, 4), (4, 4), (8, 8), (3, 3)]
(15, 20) (20, 20)
94.4190125992 94.4190125992
<15x20 sparse matrix of type '<class 'numpy.float64'>'
with 60 stored elements (blocksize = 1x1) in Block Sparse Row format>
<20x20 sparse matrix of type '<class 'numpy.float64'>'
with 106 stored elements (blocksize = 1x1) in Block Sparse Row format>
and shapes and checksums match.
For n = 10000
, Bnn
is too large for my memory. The sparse Bs
creation is slow, but the matrix multiplication is fast.
Upvotes: 1