campbeb2
campbeb2

Reputation: 69

Python Groupby with Conditions

I have a dataframe with three columns. I need to group by Name and sum the Amount column while counting the Fruit column only if it meets a certain criteria. Here's an example of what the dataframe looks like:

 Name Fruit   Amount
 Bob  Apple   5
 Bob  Orange  7
 Bill Apple   4
 Bill Orange  8
 Bill Apple   3

In this example I'd group Bob and Bill's names, sum the Amount for Apple and Orange, but count only the number of occurrences of Apple, so the new dataframe would look like this:

 Name   Sum   Count
 Bob    12      1
 Bill   15      2

I've tried the following but don't get to the correct answer:

df = pd.pivot_table(
    df,
    index=['Name'],
    aggfunc={'Amount':np.sum}
    ).rename(columns={'Fruit':'count'})

Upvotes: 2

Views: 45

Answers (2)

sammywemmy
sammywemmy

Reputation: 28659

It should be faster if you avoid the anonymous function (depending on the data size):


lump = [df.groupby('Name').Amount.sum(), 
        df.Fruit.eq('Apple').groupby(df.Name).sum().rename('Count')]

pd.concat(lump, axis = 1)

      Amount  Count
Name               
Bill      15      2
Bob       12      1

You could also create a temporary column, and used a named aggregation; again, the aim is to avoid the anonymous(lambda function):

(df.assign(Count = df.Fruit.eq('Apple'))
   .groupby('Name')
   .agg(Sum = ('Amount', 'sum'), 
        Count=('Count', 'sum'))
) 
      Sum  Count
Name            
Bill   15      2
Bob    12      1

Upvotes: 1

wwnde
wwnde

Reputation: 26676

Use agg

df.groupby('Name').agg(**{'Sum':('Amount','sum'),'Count':('Fruit', lambda x: (x=='Apple').sum())}).reset_index()

   Name  Sum  Count
0  Bill   15      2
1   Bob   12      1

Upvotes: 2

Related Questions