ealain
ealain

Reputation: 235

Create a sparse matrix from a list of rows (sparse vectors)

I would like to efficiently create the following sparse matrix of dimension (s, n1+n2):

v0 v1 
v0 v2 
v0 v3 
 ... 
v0 vs

given sparse vector v0 (1, n1) and a list of sparse vectors (1, n2) l = [v1, ... , vs].

I have tried to use coo_matrix() but it was unsuccessful as it seems to only work if you have dense vectors:

left = coo_matrix(np.repeat(v0, s))
right = coo_matrix(l)
m = hstack((left, right))

Edit 1:

I have found a workaround that does not seem very efficient:

right = vstack([x for x in l])
left = vstack([v0 for i in range(len(l))])
m = hstack((left, right))

Edit 2:

This is an example (not working) to help you understand the situation.

from scipy.sparse import random, coo_matrix
from numpy import repeat

s = 10
n1 = 3
n2 = 5

v0 = random(1, n1)
l = [random(1, n2) for i in range(s)]

left = coo_matrix(repeat(v0, s))
right = coo_matrix(l)
m = hstack((left, right))

Upvotes: 2

Views: 1311

Answers (1)

hpaulj
hpaulj

Reputation: 231325

In [1]: from scipy import sparse

In [2]: s, n1, n2 = 10,3,5
In [3]: v0 = sparse.random(1, n1)
In [4]: v0
Out[4]: 
<1x3 sparse matrix of type '<class 'numpy.float64'>'
    with 0 stored elements in COOrdinate format>
In [5]: l = [sparse.random(1, n2) for i in range(s)]
In [6]: l
Out[6]: 
[<1x5 sparse matrix of type '<class 'numpy.float64'>'
    with 0 stored elements in COOrdinate format>,
  ...
 <1x5 sparse matrix of type '<class 'numpy.float64'>'
    with 0 stored elements in COOrdinate format>]

Instead of np.repeat use sparse.vstack to create a stack of V0 copies

In [7]: V0 = sparse.vstack([v0]*s)
In [8]: V0
Out[8]: 
<10x3 sparse matrix of type '<class 'numpy.float64'>'
    with 0 stored elements in COOrdinate format>

Similarly convert the list of n2 matrices into one matrix:

In [10]: V1 = sparse.vstack(l)
In [11]: V1
Out[11]: 
<10x5 sparse matrix of type '<class 'numpy.float64'>'
    with 0 stored elements in COOrdinate format>

Now join them:

In [12]: m = sparse.hstack((V0,V1))
In [13]: m
Out[13]: 
<10x8 sparse matrix of type '<class 'numpy.float64'>'
    with 0 stored elements in COOrdinate format>

I won't make any claims about this being efficient. hstack and vstack use bmat (check their code). bmat collects the coo attributes of all the blocks, and joins them (with offsets) into the inputs to a new coo_matrix call (again, the code is readable). So you could avoid some intermediate conversions by using bmat directly, or even playing with the coo attributes directly. But hstack and vstack are relatively intuitive.

Upvotes: 2

Related Questions