Reputation: 723
If I had the following df:
amount name role desc
0 1.0 a x f
1 2.0 a y g
2 3.0 b y h
3 4.0 b y j
4 5.0 c x k
5 6.0 c x l
6 6.0 c y p
I want to group by the name
and role
columns, add up the amount
, and also do a concatenation of the desc
with a ,
:
amount name role desc
0 1.0 a x f
1 2.0 a y g
2 7.0 b y h,j
4 11.0 c x k,l
6 6.0 c y p
What would be the correct way of approaching this?
Side question: say if the df
was being read from a .csv and it had other unrelated columns, how do I do this calculation and then write to a new .csv along with the other columns (same schema as the one read)?
Upvotes: 1
Views: 4493
Reputation: 161
Extending @Vaishali's answer. To handle the remaining columns without having to specify each one you could create a dictionary and have that as the argument for the agg(regate) function.
dict = {}
for col in df:
if (col == 'column_you_wish_to_merge'):
dict[col] = ' '.join
else:
dict[col] = 'first' # or any other group aggregation operation
df.groupby(['key1', 'key2'], as_index=False).agg(dict)
Upvotes: 1
Reputation: 38415
May be not exact dupe but there are a lot of questions related to groupby agg
df.groupby(['name', 'role'], as_index=False)\
.agg({'amount':'sum', 'desc':lambda x: ','.join(x)})
name role amount desc
0 a x 1.0 f
1 a y 2.0 g
2 b y 7.0 h,j
3 c x 11.0 k,l
4 c y 6.0 p
Edit: If there are other columns in the dataframe, you can aggregate them using 'first' or 'last' or if their values are identical, include them in grouping.
Option1:
df.groupby(['name', 'role'], as_index=False).agg({'amount':'sum', 'desc':lambda x: ','.join(x), 'other1':'first', 'other2':'first'})
Option 2:
df.groupby(['name', 'role', 'other1', 'other2'], as_index=False).agg({'amount':'sum', 'desc':lambda x: ','.join(x)})
Upvotes: 9