Reputation: 14399
I have a sparse matrix z
that is a scipy.sparse.csr_matrix
and has shape (n,m)
where n<<m
. I also have labels l
which is simply a np.array
of strings with size n
.
What I'd like to do is make a csv file with the "ragged" version of the data. i.e. all of the nonzero vlaues in z[0]
would go in a column of the csv file with a header value l[0]
, but each column would have a different number of values. Unfortunately numpy
doesn't deal with ragged arrays well and I'm not sure what would be an elegant way to construct it.
Right now I'm just doing
np.savetxt(pth, z.todense().T, delimiter = ",")
and adding the column headers manually as my next process step can handle all the zeros, but is very slow that way.
EXAMPLE:
z.todense()
array([[0,0,1,0,0,-1,0,3,0,-6,4],
[-1,0,4,0,0,0,0,0,0,0,-2]])
l
array(["chan1", "chan2"])
What I want
example.csv
chan1, chan2
1,-1
-1,4
3,-2
-6,
4,
Upvotes: 0
Views: 2211
Reputation: 231550
In [74]: from scipy import sparse
In [75]: M = sparse.csr_matrix([[0,0,1,0,0,-1,0,3,0,-6,4],
...: [-1,0,4,0,0,0,0,0,0,0,-2]])
In [76]: M
Out[76]:
<2x11 sparse matrix of type '<class 'numpy.int64'>'
with 8 stored elements in Compressed Sparse Row format>
In [77]: M.A
Out[77]:
array([[ 0, 0, 1, 0, 0, -1, 0, 3, 0, -6, 4],
[-1, 0, 4, 0, 0, 0, 0, 0, 0, 0, -2]], dtype=int64)
lil
format gives the data by row:
In [78]: Ml = M.tolil()
In [79]: Ml.data
Out[79]: array([list([1, -1, 3, -6, 4]), list([-1, 4, -2])], dtype=object)
Now it's just a matter of writing those lists to file in the way you want:
In [81]: from itertools import zip_longest
In [82]: for i,j in zip_longest(*Ml.data, fillvalue=''):
...: astr = '%s, %s'%(i,j)
...: print(astr)
...:
1, -1
-1, 4
3, -2
-6,
4,
zip_longest
is an easy way to iterate through several lists, using the longest as reference.
Upvotes: 1