jeff
jeff

Reputation: 37

groupby and sum by specify row type in pandas

I am trying to do a groupby and sum by specific row type, for example 3 company sell shoe, coat and slipper, I want to groupby company and add them by specific sell type shoe + coat.

enter image description here

Text input -

  company  selltype  price
0       a      shoe     34
1       a      coat     23
2       a  slippers     12
3       b      shoe     55
4       b      coat     34
5       b  slippers     23
6       c      shoe     65
7       c      coat     34
8       c  slippers     12

Upvotes: 1

Views: 811

Answers (3)

Jim Factor
Jim Factor

Reputation: 1535

Quite a few more steps and not as concise as the other answers, but breaks down the process step by step

# read the data in from clipboard copied off stackoverflow
df2 = pd.read_clipboard()
# sum the coats with shoes
dfnoslip = df2[df2.selltype != 'slippers']
dfnoslipnext = dfnoslip.groupby('company', as_index=False).sum()
dfnoslipnext.insert(1, 'selltype', 'shoe+coat')
# get just slippers
dfnocoatshoe = df2.query('selltype != "coat" & selltype != "shoe"')
# combine and sort
dfnewcombine = dfnocoatshoe.append(dfnoslipnext)
dfnewcombine = dfnewcombine.sort_values('company')
dfnewcombine

Upvotes: 0

cs95
cs95

Reputation: 402854

Use groupby + agg -

i = df.selltype.isin(['shoe', 'coat'])
j = i.ne(i.shift()).cumsum()

f = {'selltype' : '+'.join, 'price' : 'sum'}
df.groupby(['company', j], as_index=False).agg(f)

  company   selltype  price
0       a  shoe+coat     57
1       a   slippers     12
2       b  shoe+coat     89
3       b   slippers     23
4       c  shoe+coat     99
5       c   slippers     12

Details

We need to group on two predicates -

  1. the company column, and
  2. the merchandise being sold

Since we consider shoes and coats together, we'll need to create a custom series that reflects this, which is computed using i and j -

i = df.selltype.isin(['shoe', 'coat'])
i

0     True
1     True
2    False
3     True
4     True
5    False
6     True
7     True
8    False
Name: selltype, dtype: bool

j = i.ne(i.shift()).cumsum()
j

0    1
1    1
2    2
3    3
4    3
5    4
6    5
7    5
8    6
Name: selltype, dtype: int64

Now, all that's left is the grouping operation -

df = df.groupby(['company', j], as_index=False).agg(f)

To get your exact output, you can do a little more here, use pd.Series.where -

df.company = df.company.where(df.company.ne(df.company.shift()), '')
df

  company   selltype  price
0       a  shoe+coat     57
1           slippers     12
2       b  shoe+coat     89
3           slippers     23
4       c  shoe+coat     99
5           slippers     12

Upvotes: 1

BENY
BENY

Reputation: 323326

treatsame={'shoe':'coat'}
df.groupby([df.company,df.selltype.replace(treatsame)]).\
    agg(lambda x :x.sum() if x.dtype=='int64' else '+'.join(x)).\
        reset_index('selltype',drop=True)
Out[40]: 
          selltype  price
company                  
a        shoe+coat     57
a         slippers     12
b        shoe+coat     89
b         slippers     23
c        shoe+coat     99
c         slippers     12

Upvotes: 1

Related Questions