Reputation: 37
I am trying to do a groupby and sum by specific row type, for example 3 company sell shoe, coat and slipper, I want to groupby company and add them by specific sell type shoe + coat.
Text input -
company selltype price
0 a shoe 34
1 a coat 23
2 a slippers 12
3 b shoe 55
4 b coat 34
5 b slippers 23
6 c shoe 65
7 c coat 34
8 c slippers 12
Upvotes: 1
Views: 811
Reputation: 1535
Quite a few more steps and not as concise as the other answers, but breaks down the process step by step
# read the data in from clipboard copied off stackoverflow
df2 = pd.read_clipboard()
# sum the coats with shoes
dfnoslip = df2[df2.selltype != 'slippers']
dfnoslipnext = dfnoslip.groupby('company', as_index=False).sum()
dfnoslipnext.insert(1, 'selltype', 'shoe+coat')
# get just slippers
dfnocoatshoe = df2.query('selltype != "coat" & selltype != "shoe"')
# combine and sort
dfnewcombine = dfnocoatshoe.append(dfnoslipnext)
dfnewcombine = dfnewcombine.sort_values('company')
dfnewcombine
Upvotes: 0
Reputation: 402854
Use groupby
+ agg
-
i = df.selltype.isin(['shoe', 'coat'])
j = i.ne(i.shift()).cumsum()
f = {'selltype' : '+'.join, 'price' : 'sum'}
df.groupby(['company', j], as_index=False).agg(f)
company selltype price
0 a shoe+coat 57
1 a slippers 12
2 b shoe+coat 89
3 b slippers 23
4 c shoe+coat 99
5 c slippers 12
Details
We need to group on two predicates -
company
column, and Since we consider shoes and coats together, we'll need to create a custom series that reflects this, which is computed using i
and j
-
i = df.selltype.isin(['shoe', 'coat'])
i
0 True
1 True
2 False
3 True
4 True
5 False
6 True
7 True
8 False
Name: selltype, dtype: bool
j = i.ne(i.shift()).cumsum()
j
0 1
1 1
2 2
3 3
4 3
5 4
6 5
7 5
8 6
Name: selltype, dtype: int64
Now, all that's left is the grouping operation -
df = df.groupby(['company', j], as_index=False).agg(f)
To get your exact output, you can do a little more here, use pd.Series.where
-
df.company = df.company.where(df.company.ne(df.company.shift()), '')
df
company selltype price
0 a shoe+coat 57
1 slippers 12
2 b shoe+coat 89
3 slippers 23
4 c shoe+coat 99
5 slippers 12
Upvotes: 1
Reputation: 323326
treatsame={'shoe':'coat'}
df.groupby([df.company,df.selltype.replace(treatsame)]).\
agg(lambda x :x.sum() if x.dtype=='int64' else '+'.join(x)).\
reset_index('selltype',drop=True)
Out[40]:
selltype price
company
a shoe+coat 57
a slippers 12
b shoe+coat 89
b slippers 23
c shoe+coat 99
c slippers 12
Upvotes: 1