Georg Heiler
Georg Heiler

Reputation: 17676

create row, column, data pandas dataframe from sparse matrix

How can I create a sparse matrix in the format of COO and have the pandas dataframe not unnest to a dense layout but keep the COO format for row,column,data?

import numpy as np
import pandas as pd
from scipy.sparse import csr_matrix
from scipy.sparse import coo_matrix

a = np.eye(7)
a_csr = csr_matrix(a)
a_coo = a_csr.tocoo()
print(a_coo)
  (0, 0)    1.0
  (1, 1)    1.0
  (2, 2)    1.0
  (3, 3)    1.0
  (4, 4)    1.0
  (5, 5)    1.0
  (6, 6)    1.0

I.e. how can I obtain a pandas dataframe from this that does not unnest this to

pd.DataFrame.sparse.from_spmatrix(a_coo)

enter image description here

but keeps the row,column,data format as also visualized in the print operation?

Upvotes: 1

Views: 1508

Answers (2)

hpaulj
hpaulj

Reputation: 231385

The values you want to put in the dataframe are available as

a_coo.row, a_coo.col, a_coo.data

Upvotes: 2

Georg Heiler
Georg Heiler

Reputation: 17676

one possible workaround could be to use mtx serialization and interpreting the data as a CSV.

from scipy import io
io.mmwrite('sparse_thing', a_csr)
!cat sparse_thing.mtx

sparse_mtx_mm_df = pd.read_csv('sparse_thing.mtx', sep=' ', skiprows=3, header=None)
sparse_mtx_mm_df.columns = ['row', 'column', 'data_value']
sparse_mtx_mm_df

Is there a better (native, non serialization-baased) solution?

re_sparsed = coo_matrix((sparse_mtx_mm_df['data_value'].values, (sparse_mtx_mm_df.numpy_row.values, sparse_mtx_mm_df.numpy_column.values)))
re_sparsed.todense()

would then give back the initial numpy array

Upvotes: 0

Related Questions