Reputation: 127
I have a very large sparse matrix of the type 'scipy.sparse.coo.coo_matrix'. I can convert to csr with .tocsr(), however .todense() will not work since the array is too large. I want to be able to extract elements from the matrix as I would do with a regular array, so that I may pass row elements to a function.
For reference, when printed, the matrix looks as follows:
(7, 0) 0.531519363001
(48, 24) 0.400946334437
(70, 6) 0.684460955022
...
Upvotes: 9
Views: 11996
Reputation: 3623
We can convert a scipy.sparse.coo_array
to a pandas.DataFrame
.
Utility function:
from scipy.sparse import coo_array
import pandas as pd
def coo_to_dataframe(array: coo_array) -> pd.DataFrame:
"""Convert scipy COO sparse array to a pandas data frame."""
labels = array.data
columns = array.col
rows = array.row
data_frame = pd.DataFrame({"x": columns, "y": rows, "label": labels})
return data_frame
Create a sparse array (borrowed from @hpaulj):
sparse_array = coo_array(([.5, .4, .6], ([0 , 1, 2], [0, 5, 3])), shape=(5, 7))
For a small example array, we can view it as a dense array:
sparse_array.toarray()
array([[0.5, 0. , 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. , 0.4, 0. ],
[0. , 0. , 0. , 0.6, 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. , 0. , 0. ]])
Finally, we convert the sparse array to a DataFrame and plot it.
dataframe = coo_to_dataframe(sparse_array)
dataframe.plot.scatter("x", "y", title="Sparse labels")
Upvotes: 0
Reputation: 231385
Make a matrix with 3 elements:
In [550]: M = sparse.coo_matrix(([.5,.4,.6],([0,1,2],[0,5,3])), shape=(5,7))
It's default display (repr(M)
):
In [551]: M
Out[551]:
<5x7 sparse matrix of type '<class 'numpy.float64'>'
with 3 stored elements in COOrdinate format>
and print display (str(M)) - looks like the input:
In [552]: print(M)
(0, 0) 0.5
(1, 5) 0.4
(2, 3) 0.6
convert to csr
format:
In [553]: Mc=M.tocsr()
In [554]: Mc[1,:] # row 1 is another matrix (1 row):
Out[554]:
<1x7 sparse matrix of type '<class 'numpy.float64'>'
with 1 stored elements in Compressed Sparse Row format>
In [555]: Mc[1,:].A # that row as 2d array
Out[555]: array([[ 0. , 0. , 0. , 0. , 0. , 0.4, 0. ]])
In [556]: print(Mc[1,:]) # like 2nd element of M except for row number
(0, 5) 0.4
Individual element:
In [560]: Mc[1,5]
Out[560]: 0.40000000000000002
The data attributes of these format (if you want to dig further)
In [562]: Mc.data
Out[562]: array([ 0.5, 0.4, 0.6])
In [563]: Mc.indices
Out[563]: array([0, 5, 3], dtype=int32)
In [564]: Mc.indptr
Out[564]: array([0, 1, 2, 3, 3, 3], dtype=int32)
In [565]: M.data
Out[565]: array([ 0.5, 0.4, 0.6])
In [566]: M.col
Out[566]: array([0, 5, 3], dtype=int32)
In [567]: M.row
Out[567]: array([0, 1, 2], dtype=int32)
Upvotes: 13