Reputation: 235
I would like to efficiently create the following sparse matrix of dimension (s, n1+n2)
:
v0 v1
v0 v2
v0 v3
...
v0 vs
given sparse vector v0 (1, n1)
and a list of sparse vectors (1, n2)
l = [v1, ... , vs]
.
I have tried to use coo_matrix()
but it was unsuccessful as it seems to only work if you have dense vectors:
left = coo_matrix(np.repeat(v0, s))
right = coo_matrix(l)
m = hstack((left, right))
Edit 1:
I have found a workaround that does not seem very efficient:
right = vstack([x for x in l])
left = vstack([v0 for i in range(len(l))])
m = hstack((left, right))
Edit 2:
This is an example (not working) to help you understand the situation.
from scipy.sparse import random, coo_matrix
from numpy import repeat
s = 10
n1 = 3
n2 = 5
v0 = random(1, n1)
l = [random(1, n2) for i in range(s)]
left = coo_matrix(repeat(v0, s))
right = coo_matrix(l)
m = hstack((left, right))
Upvotes: 2
Views: 1311
Reputation: 231325
In [1]: from scipy import sparse
In [2]: s, n1, n2 = 10,3,5
In [3]: v0 = sparse.random(1, n1)
In [4]: v0
Out[4]:
<1x3 sparse matrix of type '<class 'numpy.float64'>'
with 0 stored elements in COOrdinate format>
In [5]: l = [sparse.random(1, n2) for i in range(s)]
In [6]: l
Out[6]:
[<1x5 sparse matrix of type '<class 'numpy.float64'>'
with 0 stored elements in COOrdinate format>,
...
<1x5 sparse matrix of type '<class 'numpy.float64'>'
with 0 stored elements in COOrdinate format>]
Instead of np.repeat
use sparse.vstack
to create a stack of V0
copies
In [7]: V0 = sparse.vstack([v0]*s)
In [8]: V0
Out[8]:
<10x3 sparse matrix of type '<class 'numpy.float64'>'
with 0 stored elements in COOrdinate format>
Similarly convert the list of n2
matrices into one matrix:
In [10]: V1 = sparse.vstack(l)
In [11]: V1
Out[11]:
<10x5 sparse matrix of type '<class 'numpy.float64'>'
with 0 stored elements in COOrdinate format>
Now join them:
In [12]: m = sparse.hstack((V0,V1))
In [13]: m
Out[13]:
<10x8 sparse matrix of type '<class 'numpy.float64'>'
with 0 stored elements in COOrdinate format>
I won't make any claims about this being efficient. hstack
and vstack
use bmat
(check their code). bmat
collects the coo
attributes of all the blocks, and joins them (with offsets) into the inputs to a new coo_matrix
call (again, the code is readable). So you could avoid some intermediate conversions by using bmat
directly, or even playing with the coo
attributes directly. But hstack
and vstack
are relatively intuitive.
Upvotes: 2