Justin Benfit
Justin Benfit

Reputation: 483

How to group by one column if condition is true in another column summing values in third column with pandas

I can't think of how to do this: As the headline explains I want to group a dataframe by the column acquired_month only if another column contains Closed Won(in the example I made a helper column that just marks True if that condition is fulfilled although I'm not sure that step is necessary). Then if those conditions are met I want to sum the values of a third column but can't think how to do it. Here is my code so far:

us_lead_scoring.loc[us_lead_scoring['Stage'].str.contains('Closed Won'), 'closed_won_binary'] = True acquired_date = us_lead_scoring.groupby('acquired_month')['closed_won_binary'].sum()

but this just sums the true false column not the sum column if the true false column is true after the acquired_month groupby. Any direction appreciated.

Thanks

Upvotes: 1

Views: 684

Answers (1)

jezrael
jezrael

Reputation: 862406

If need aggregate column col replace non matched values to 0 values in Series.where and then aggregate sum:

us_lead_scoring = pd.DataFrame({'Stage':['Closed Won1','Closed Won2','Closed', 'Won'],
                                'col':[1,3,5,6],
                                'acquired_month':[1,1,1,2]})

out = (us_lead_scoring['col'].where(us_lead_scoring['Stage']
                             .str.contains('Closed Won'), 0)
                             .groupby(us_lead_scoring['acquired_month'])
                             .sum()
                             .reset_index(name='SUM'))
     
print (out)
   acquired_month  SUM
0               1    4
1               2    0

Upvotes: 1

Related Questions