Ed Jefferies
Ed Jefferies

Reputation: 207

Remove substring and merge rows in python/pandas

my df:

   description               total      average      number
0 NFL football (white) L     49693        66       1007
1 NFL football (white) XL    79682        74       1198
2 NFL football (white) XS    84943        81       3792
3 NFL football (white) S     78371        73       3974
4 NFL football (blue) L      99482        92       3978
5 NFL football (blue) M      32192        51       3135
6 NFL football (blue XL      75343        71       2879
7 NFL football (red) XXL     84391        79       1192
8 NFL football (red) XS      34727        57       992
9 NFL football (red) L       44993        63       1562

What I would like to do is remove the sizes and be left with a sum total, mean average and sum number for each colour of football:

   description               total      average    number
0 NFL football (white)       292689       74       9971
1 NFL football (blue)        207017       71       9992
2 NFL football (red)         164111       66       3746

Any suggestions much appreciated!

Upvotes: 0

Views: 110

Answers (2)

supercooler8
supercooler8

Reputation: 503

replace works, but you can also just drop off the last word in the description using rsplit and then do a groupby:

df.description = df.description.apply(lambda x: x.rsplit(' ',1)[0])

df.groupby(by='description')[['total', 'number']].sum() 

Upvotes: 0

SeaBean
SeaBean

Reputation: 23217

You can groupby the reformatted description field (without modifying original contents of description) where the reformatting is done by splitting with a space and exclude the last part by using .str.split(), .str.join(). Then aggregate with .agg().

Further reformatting the output to the desired output by rounding up and casting to interger with .round() and .astype().

(df.groupby(
            df['description'].str.split(' ').str[:-1].str.join(' ')
           )
   .agg({'total': 'sum', 'average': 'mean', 'number': 'sum'})
   .round(0)
   .astype(int)
).reset_index()

Result:

            description   total  average  number
0   NFL football (blue)  207017       71    9992
1    NFL football (red)  164111       66    3746
2  NFL football (white)  292689       74    9971

Upvotes: 2

Related Questions