Save Sparse pandas dataframe as parquet file

Question

I want to save a sparse pandassdataframe as parquetfile. unfortunately it seems like sparse datatypes are not supported by the underlying pyarrow.

Consider this example code:

from scipy.sparse import csr_matrix
import numpy as np
import pandas as pd


arr = np.random.random(size=(1000, 5))
arr[arr < .9] = 0

sp_arr = csr_matrix(arr)
sdf = pd.DataFrame.sparse.from_spmatrix(sp_arr, columns = ['a','b','c','d','e'])
sdf.to_parquet('testfile.parquet')

This results in the following error:

TypeError: Sparse pandas data (column a) not supported.

My real dataset is very large, so I cannot 'dense' the dataframe. I like the dataframe format as i can have row and column names which a numpy matrix does not save.

Is there an available workaround or any other way to save the dataframe?

Save Sparse pandas dataframe as parquet file

Answers (1)

Related Questions