Han Fang
Han Fang

Reputation: 41

Convert Pandas SparseDataframe to Scipy sparse csc_matrix

I want to convert a pandas SparseDataFrame to a scipy.sparse.csc_matrix. But I don't want to convert it back to a dense matrix first.

Right now I have something like the below.

df = pd.get_dummies(df, sparse=True)

Basically what I need is to further get a scipy.sparse.csc_matrix from df. Is there a way to do it?

Upvotes: 3

Views: 2814

Answers (2)

Han Fang
Han Fang

Reputation: 41

Thanks to @hpaulj's reply. I ended it up using the template from https://stackoverflow.com/a/38157234/7298911.

Here is the modified implementation.

def sparseDfToCsc(df):
    columns = df.columns
    dat, rows = map(list,zip(*[(df[col].sp_values-df[col].fill_value, df[col].sp_index.to_int_index().indices) for col in columns]))
    cols = [np.ones_like(a)*i for (i,a) in enumerate(dat)]
    datF, rowsF, colsF = np.concatenate(dat), np.concatenate(rows), np.concatenate(cols)
    arr = sparse.coo_matrix((datF, (rowsF, colsF)), df.shape, dtype=np.float64)
    return arr.tocsc()

df = pd.get_dummies(df, sparse=True)
cscMatrix = sparseDfToCsc(df)

Upvotes: 1

hpaulj
hpaulj

Reputation: 231325

I've participated in various sparse Pandas to scipy sparce questions.

There is a Pandas method for converting a multiindex sparse series to coo matrix:

http://pandas-docs.github.io/pandas-docs-travis/sparse.html#interaction-with-scipy-sparse

But see Pandas sparse dataFrame to sparse matrix, without generating a dense matrix in memory for data frame to sparse.

and

How do I create a scipy sparse matrix from a pandas dataframe?

and more recently, How can I "sparsify" on two values?

Once you have a coo matrix, you can easily convert it to csr or csc.

To avoid confusion I'd suggest creating a sample dataframe, convert to dense and then to sparse. That we have something concrete to test. I used to recommend the Pandas method, without realizing that MultiIndex was different from DataFrame.

Upvotes: 0

Related Questions