How to keep partitions after performing a group-by aggregation in dask

Question

In my application I perform an aggregation on a dask dataframe using groupby, ordered by a certain id.

However I would like that the aggregation maintains the partition divisions, as I intend to perform joins with other dataframe identically partitioned.

import pandas as pd
import numpy as np
import dask.dataframe as dd

df =pd.DataFrame(np.arange(16), columns=['my_data'])
df.index.name = 'my_id'

ddf = dd.from_pandas(df, npartitions=4)
ddf.npartitions
# 4

ddf.divisions
# (0, 4, 8, 12, 15)

aggregated = ddf.groupby('my_id').agg({'my_data': 'count'})
aggregated.divisions
# (None, None)

Is there a way to accomplish that?

How to keep partitions after performing a group-by aggregation in dask

Answers (1)

Related Questions