Reputation: 41
I want to convert a pandas SparseDataFrame
to a scipy.sparse.csc_matrix
. But I don't want to convert it back to a dense matrix first.
Right now I have something like the below.
df = pd.get_dummies(df, sparse=True)
Basically what I need is to further get a scipy.sparse.csc_matrix
from df
. Is there a way to do it?
Upvotes: 3
Views: 2814
Reputation: 41
Thanks to @hpaulj's reply. I ended it up using the template from https://stackoverflow.com/a/38157234/7298911.
Here is the modified implementation.
def sparseDfToCsc(df):
columns = df.columns
dat, rows = map(list,zip(*[(df[col].sp_values-df[col].fill_value, df[col].sp_index.to_int_index().indices) for col in columns]))
cols = [np.ones_like(a)*i for (i,a) in enumerate(dat)]
datF, rowsF, colsF = np.concatenate(dat), np.concatenate(rows), np.concatenate(cols)
arr = sparse.coo_matrix((datF, (rowsF, colsF)), df.shape, dtype=np.float64)
return arr.tocsc()
df = pd.get_dummies(df, sparse=True)
cscMatrix = sparseDfToCsc(df)
Upvotes: 1
Reputation: 231325
I've participated in various sparse Pandas to scipy sparce questions.
There is a Pandas method for converting a multiindex sparse series to coo matrix:
http://pandas-docs.github.io/pandas-docs-travis/sparse.html#interaction-with-scipy-sparse
But see Pandas sparse dataFrame to sparse matrix, without generating a dense matrix in memory for data frame to sparse.
and
How do I create a scipy sparse matrix from a pandas dataframe?
and more recently, How can I "sparsify" on two values?
Once you have a coo
matrix, you can easily convert it to csr
or csc
.
To avoid confusion I'd suggest creating a sample dataframe, convert to dense and then to sparse. That we have something concrete to test. I used to recommend the Pandas method, without realizing that MultiIndex was different from DataFrame.
Upvotes: 0