Reputation: 101
I am trying to convert a pandas dataframe that is MultiIndexed on two variables (an ID and a DateTime variable) to dask dataframe however I get the following error;
"NotImplementedError: Dask does not support MultiIndex Dataframes"
I am using the following code
import pandas as pd
import dask.dataframe as dd
dask_df = dd.from_pandas(pandas_df)
Actually, I have over 700 pandas dataframes (each over 100 MB) I am planning to convert each pandas dataframe into dask and then append them all to one big dask dataframe to analyze the whole data. I think the MultiIndex thing is the only issue here. Please let me know if I am going the wrong way about this.
Upvotes: 4
Views: 4840
Reputation: 57251
Currently Dask DataFrame does not support dataframes with MultiIndexes.
You might consider converting all but one of your index columns into normal columns with reset_index
.
Upvotes: 4