Reputation: 3420
I have a generator that generates single dimension numpy.array
s of the same length. I would like to have a sparse matrix containing that data. Rows are generated in the same order I'd like to have them in the final matrix. csr
matrix is preferable over lil
matrix, but I assume the latter will be easier to build in the scenario I'm describing.
Assuming row_gen
is a generator yielding numpy.array
rows, the following code works as expected.
def row_gen():
yield numpy.array([1, 2, 3])
yield numpy.array([1, 0, 1])
yield numpy.array([1, 0, 0])
matrix = scipy.sparse.lil_matrix(list(row_gen()))
Because the list will essentially ruin any advantages of the generator, I'd like the following to have the same end result. More specifically, I cannot hold the entire dense matrix (or a list of all matrix rows) in memory:
def row_gen():
yield numpy.array([1, 2, 3])
yield numpy.array([1, 0, 1])
yield numpy.array([1, 0, 0])
matrix = scipy.sparse.lil_matrix(row_gen())
However it raises the following exception when run:
TypeError: no supported conversion for types: (dtype('O'),)
I also noticed the trace includes the following:
File "/usr/local/lib/python2.7/site-packages/scipy/sparse/lil.py", line 122, in __init__
A = csr_matrix(A, dtype=dtype).tolil()
Which makes me think using scipy.sparse.lil_matrix
will end up creating a csr
matrix and only then convert that to a lil
matrix. In that case I would rather just create csr
matrix to begin with.
To recap, my question is: What is the most efficient way to create a scipy.sparse
matrix from a python generator or numpy single dimensional arrays?
Upvotes: 2
Views: 3182
Reputation: 231615
Let's look at the code for sparse.lil_matrix
. It checks the first argument:
if isspmatrix(arg1): # is is already a sparse matrix
...
elif isinstance(arg1,tuple): # is it the shape tuple
if isshape(arg1):
if shape is not None:
raise ValueError('invalid use of shape parameter')
M, N = arg1
self.shape = (M,N)
self.rows = np.empty((M,), dtype=object)
self.data = np.empty((M,), dtype=object)
for i in range(M):
self.rows[i] = []
self.data[i] = []
else:
raise TypeError('unrecognized lil_matrix constructor usage')
else:
# assume A is dense
try:
A = np.asmatrix(arg1)
except TypeError:
raise TypeError('unsupported matrix type')
else:
from .csr import csr_matrix
A = csr_matrix(A, dtype=dtype).tolil()
self.shape = A.shape
self.dtype = A.dtype
self.rows = A.rows
self.data = A.data
As per the documentation - you can construct it from another sparse matrix, from a shape, and from a dense array. The dense array constructor first makes a csr
matrix, and then converts it to lil
.
The shape version constructs an empty lil
with data like:
In [161]: M=sparse.lil_matrix((3,5),dtype=int)
In [163]: M.data
Out[163]: array([[], [], []], dtype=object)
In [164]: M.rows
Out[164]: array([[], [], []], dtype=object)
It should be obvious that passing a generator isn't going work - it isn't a dense array.
But having created a lil
matrix, you can fill in elements with a regular array assignment:
In [167]: M[0,:]=[1,0,2,0,0]
In [168]: M[1,:]=[0,0,2,0,0]
In [169]: M[2,3:]=[1,1]
In [170]: M.data
Out[170]: array([[1, 2], [2], [1, 1]], dtype=object)
In [171]: M.rows
Out[171]: array([[0, 2], [2], [3, 4]], dtype=object)
In [172]: M.A
Out[172]:
array([[1, 0, 2, 0, 0],
[0, 0, 2, 0, 0],
[0, 0, 0, 1, 1]])
and you can assign values to the sublists directly (I think this is faster, but a little more dangerous):
In [173]: M.data[1]=[1,2,3]
In [174]: M.rows[1]=[0,2,4]
In [176]: M.A
Out[176]:
array([[1, 0, 2, 0, 0],
[1, 0, 2, 0, 3],
[0, 0, 0, 1, 1]])
Another incremental approach is to construct the 3 arrays or lists of coo
format, and then make a coo
or csr
from those.
sparse.bmat
is another option, and its code is a good example of building the coo
inputs. I'll let you look at that yourself.
Upvotes: 1