Sher Afghan
Sher Afghan

Reputation: 101

How to convert MultiIndex Pandas dataframe to Dask Dataframe

I am trying to convert a pandas dataframe that is MultiIndexed on two variables (an ID and a DateTime variable) to dask dataframe however I get the following error;

"NotImplementedError: Dask does not support MultiIndex Dataframes" 

I am using the following code

import pandas as pd
import dask.dataframe as dd

dask_df = dd.from_pandas(pandas_df)

Actually, I have over 700 pandas dataframes (each over 100 MB) I am planning to convert each pandas dataframe into dask and then append them all to one big dask dataframe to analyze the whole data. I think the MultiIndex thing is the only issue here. Please let me know if I am going the wrong way about this.

Upvotes: 4

Views: 4840

Answers (1)

MRocklin
MRocklin

Reputation: 57251

Currently Dask DataFrame does not support dataframes with MultiIndexes.

You might consider converting all but one of your index columns into normal columns with reset_index.

Upvotes: 4

Related Questions