tsu90280
tsu90280

Reputation: 437

How to convert pandas dataframe to a sparse matrix using scipy's csr_matrix?

I want to cast a DataFrame to sparse matrix using csr_matrix from scipy library, but first I have to convert it to a SparseDataFrame. In previous versions of pandas I used pd.SparseDataFrame(df).to_coo() for such purposes, but since pandas 1.0.0 this method is deprecated. Does anyone know how to perform such conversion using latest pandas api. I used this migration guide and tried various combination but still unable to achieve desired result. Following the guide, when I do the following

csr_matrix(pd.DataFrame.sparse.from_spmatrix(df).to_coo())

I get this error

AttributeError: 'DataFrame' object has no attribute 'tocsc'

Can anyone help me how to solve this? Also I do find other posts, but couldn't helped me in my case link link link

Upvotes: 1

Views: 3264

Answers (1)

M_S_N
M_S_N

Reputation: 2810

IIUC and using the third link you shared, you can convert your df data to sparse data using pd.SparseDtype, like this

df_sparsed = df.astype(pd.SparseDtype("float", np.nan)

You can read more about pd.SparseDtype here to choose right parameters for your data and then use it in your above command like this:

csr_matrix(df_sparsed.sparse.to_coo()) # Note you need .sparse accessor to access .to_coo()

Simple one liner will be

csr_matrix(df.astype(pd.SparseDtype("float", np.nan)).sparse.to_coo())

Upvotes: 3

Related Questions