user7289
user7289

Reputation: 34398

Convert Pandas dataframe to Sparse Numpy Matrix directly

I am creating a matrix from a Pandas dataframe as follows:

dense_matrix = np.array(df.as_matrix(columns = None), dtype=bool).astype(np.int)

And then into a sparse matrix with:

sparse_matrix = scipy.sparse.csr_matrix(dense_matrix)

Is there any way to go from a df straight to a sparse matrix?

Thanks in advance.

Upvotes: 61

Views: 83342

Answers (3)

Justin Silva
Justin Silva

Reputation: 19

Solution:

import pandas as pd
import scipy
from scipy.sparse import csr_matrix

csr_matrix = csr_matrix(df.astype(pd.SparseDtype("float64",0)).sparse.to_coo())

Explanation:

to_coo needs the pd.DataFrame to be in a sparse format, so the dataframe will need to be converted to a sparse datatype: df.astype(pd.SparseDtype("float64",0))

After it is converted to a COO matrix, it can be converted to a CSR matrix.

Upvotes: 1

G. Cohen
G. Cohen

Reputation: 620

There is a way to do it without converting to dense en route: csr_sparse_matrix = df.sparse.to_coo().tocsr()

Upvotes: 4

Dan Allan
Dan Allan

Reputation: 35265

df.values is a numpy array, and accessing values that way is always faster than np.array.

scipy.sparse.csr_matrix(df.values)

You might need to take the transpose first, like df.values.T. In DataFrames, the columns are axis 0.

Upvotes: 74

Related Questions