Bob
Bob

Reputation: 571

Add scipy sparse row matrix to another sparse matrix

I have a csr_matrix A of shape (70000, 80000) and another csr_matrix Bof shape (1, 80000). How can I efficiently add B to every row of A? One idea is to somehow create a sparse matrix B' which is rows of B repeated, but numpy.repeat does not work and using a matrix of ones to create B' is very memory inefficient.

I also tried iterating through every row of A and adding B to it, but that again is very time inefficient.

Update: I tried something very simple which seems to be very efficient than the ideas I mentioned above. The idea is to use scipy.sparse.vstack:

C = sparse.vstack([B for x in range(A.shape[0])])
A + C

This performs well for my task! Few more realizations: I initially tried an iterative approach where I called vstackmultiple times, this approach is slower than calling it just once.

Upvotes: 4

Views: 1238

Answers (1)

unutbu
unutbu

Reputation: 879083

A + B[np.zeros(A.shape[0])] is another way to expand B to the same shape as A.

It has about the same performance and memory footprint as Warren Weckesser's solution:

import numpy as np
import scipy.sparse as sparse

N, M = 70000, 80000
A = sparse.rand(N, M, density=0.001).tocsr()
B = sparse.rand(1, M, density=0.001).tocsr()

In [185]: %timeit u = sparse.csr_matrix(np.ones((A.shape[0], 1), dtype=B.dtype)); Bp = u * B; A + Bp
1 loops, best of 3: 284 ms per loop

In [186]: %timeit A + B[np.zeros(A.shape[0])]
1 loops, best of 3: 280 ms per loop

and appears to be faster than using sparse.vstack:

In [187]: %timeit A + sparse.vstack([B for x in range(A.shape[0])])
1 loops, best of 3: 606 ms per loop

Upvotes: 3

Related Questions