Daniel F
Daniel F

Reputation: 14399

Sparse matrix output to csv

I have a sparse matrix z that is a scipy.sparse.csr_matrix and has shape (n,m) where n<<m. I also have labels l which is simply a np.array of strings with size n.

What I'd like to do is make a csv file with the "ragged" version of the data. i.e. all of the nonzero vlaues in z[0] would go in a column of the csv file with a header value l[0], but each column would have a different number of values. Unfortunately numpy doesn't deal with ragged arrays well and I'm not sure what would be an elegant way to construct it.

Right now I'm just doing

np.savetxt(pth, z.todense().T, delimiter = ",")

and adding the column headers manually as my next process step can handle all the zeros, but is very slow that way.

EXAMPLE:

z.todense()
array([[0,0,1,0,0,-1,0,3,0,-6,4],
       [-1,0,4,0,0,0,0,0,0,0,-2]])

l
array(["chan1", "chan2"])

What I want

example.csv

chan1, chan2
1,-1
-1,4
3,-2
-6,
4,

Upvotes: 0

Views: 2211

Answers (1)

hpaulj
hpaulj

Reputation: 231550

In [74]: from scipy import sparse

In [75]: M = sparse.csr_matrix([[0,0,1,0,0,-1,0,3,0,-6,4],
    ...:        [-1,0,4,0,0,0,0,0,0,0,-2]])
In [76]: M
Out[76]: 
<2x11 sparse matrix of type '<class 'numpy.int64'>'
    with 8 stored elements in Compressed Sparse Row format>

In [77]: M.A
Out[77]: 
array([[ 0,  0,  1,  0,  0, -1,  0,  3,  0, -6,  4],
       [-1,  0,  4,  0,  0,  0,  0,  0,  0,  0, -2]], dtype=int64)

lil format gives the data by row:

In [78]: Ml = M.tolil()
In [79]: Ml.data
Out[79]: array([list([1, -1, 3, -6, 4]), list([-1, 4, -2])], dtype=object)

Now it's just a matter of writing those lists to file in the way you want:

In [81]: from itertools import zip_longest

In [82]: for i,j in zip_longest(*Ml.data, fillvalue=''):
    ...:     astr = '%s, %s'%(i,j)
    ...:     print(astr)
    ...:     
1, -1
-1, 4
3, -2
-6, 
4, 

zip_longest is an easy way to iterate through several lists, using the longest as reference.

Upvotes: 1

Related Questions