Reputation: 207
my df:
description total average number
0 NFL football (white) L 49693 66 1007
1 NFL football (white) XL 79682 74 1198
2 NFL football (white) XS 84943 81 3792
3 NFL football (white) S 78371 73 3974
4 NFL football (blue) L 99482 92 3978
5 NFL football (blue) M 32192 51 3135
6 NFL football (blue XL 75343 71 2879
7 NFL football (red) XXL 84391 79 1192
8 NFL football (red) XS 34727 57 992
9 NFL football (red) L 44993 63 1562
What I would like to do is remove the sizes and be left with a sum total, mean average and sum number for each colour of football:
description total average number
0 NFL football (white) 292689 74 9971
1 NFL football (blue) 207017 71 9992
2 NFL football (red) 164111 66 3746
Any suggestions much appreciated!
Upvotes: 0
Views: 110
Reputation: 503
replace works, but you can also just drop off the last word in the description using rsplit and then do a groupby:
df.description = df.description.apply(lambda x: x.rsplit(' ',1)[0])
df.groupby(by='description')[['total', 'number']].sum()
Upvotes: 0
Reputation: 23217
You can groupby
the reformatted description
field (without modifying original contents of description
) where the reformatting is done by splitting with a space and exclude the last part by using .str.split()
, .str.join()
. Then aggregate with .agg()
.
Further reformatting the output to the desired output by rounding up and casting to interger with .round()
and .astype()
.
(df.groupby(
df['description'].str.split(' ').str[:-1].str.join(' ')
)
.agg({'total': 'sum', 'average': 'mean', 'number': 'sum'})
.round(0)
.astype(int)
).reset_index()
Result:
description total average number
0 NFL football (blue) 207017 71 9992
1 NFL football (red) 164111 66 3746
2 NFL football (white) 292689 74 9971
Upvotes: 2