Reputation: 392
dask dataframe looks like this:
A B C D
1 foo xx this
1 foo xx belongs
1 foo xx together
4 bar xx blubb
i want to groupy by columns A,B,C and join the strings from D with a blank between, to get
A B C D
1 foo xx this belongs together
4 bar xx blubb
i see how to do this with pandas:
df_grouped = df.groupby(['A','B','C'])['D'].agg(' '.join).reset_index()
how can this be achieved with dask?
Upvotes: 4
Views: 4963
Reputation: 4004
ddf = ddf.groupby(['A','B','C'])['D'].apply(lambda row: ' '.join(row)).reset_index()
ddf.compute()
Output:
Out[75]:
A B C D
0 1 foo xx this belongs together
0 4 bar xx blubb
Upvotes: 3
Reputation: 57251
You could use a CustomAggregation, where both the per-chunk and aggregation operations are your ' '.join
method.
https://docs.dask.org/en/latest/dataframe-api.html#custom-aggregation
Upvotes: 1