GlassShark1
GlassShark1

Reputation: 217

Sum a column in a pandas dataframe where a condition is met in one column, but grouped by another

I have a dataframe like this:

        Ref_No  Definition  Total_to_Add
    0   ref1        B            20        
    1   ref2        A            30        
    2   ref1        B            40        
    3   ref2        A            50        
    4   ref1        B            60        
    5   ref2        B            50         
    6   ref1        B            60        
    7   ref2        B            50        
    8   ref1        B            60        

For each reference, I want to sum Total_to_Add if they are 'B' and the same reference number (i'll have another column for A). There are 100's of reference numbers.

I can sum those that meet a single condition like this:

df['ANSWER'] = df[df['Definition']=='A']['Total_to_Add'].sum()

Or i can group by a reference like this:

df['ANSWER']=(df.groupby('Ref_No')['Total_to_Add'].transform('sum'))

But i can't seem to combine these functions, i.e. create a new column that totals up if the definition is 'B' and total by Ref_No.

I'm aiming for output like the below:

        Ref_No  Definition  Total_to_Add  Total_'B'
    0   ref1        B            20        240
    1   ref2        A            30        100
    2   ref1        B            40        240
    3   ref2        A            50        100
    4   ref1        B            60        240
    5   ref2        B            50        100 
    6   ref1        B            60        240
    7   ref2        B            50        100
    8   ref1        B            60        240

Any wisdom appreciated! Thanks

Upvotes: 1

Views: 242

Answers (4)

jezrael
jezrael

Reputation: 862406

Replace non B values to 0 by Series.where and then use GroupBy.transform:

df['ANSWER']= (df['Total_to_Add'].where(df.Definition=='B', 0)
                                 .groupby(df['Ref_No']).transform('sum'))
print (df)
  Ref_No Definition  Total_to_Add  Total_'B'  ANSWER
0   ref1          B            20        240     240
1   ref2          A            30        100     100
2   ref1          B            40        240     240
3   ref2          A            50        100     100
4   ref1          B            60        240     240
5   ref2          B            50        100     100
6   ref1          B            60        240     240
7   ref2          B            50        100     100
8   ref1          B            60        240     240

Upvotes: 2

BENY
BENY

Reputation: 323226

I will do transform

s=df['Total_to_Add'].mask(df.Definition!='B').groupby(df['Ref_No']).transform('sum')
s
0    240.0
1    100.0
2    240.0
3    100.0
4    240.0
5    100.0
6    240.0
7    100.0
8    240.0
Name: Total_to_Add, dtype: float64

df['New']=s

Upvotes: 1

davidbilla
davidbilla

Reputation: 2222

This will produce the sum of 'Total_to_Add' in 'Total_B' column if the 'Definition' == 'B'. df['Total_B']=df[df['Definition']=='B'].groupby(by=['Ref_No','Definition'])['Total_to_Add'].transform('sum')

Upvotes: 1

Chris Adams
Chris Adams

Reputation: 18647

Try:

df['Total_B'] = (df['Definition'].eq('B').mul(df['Total_to_Add'])
                 .groupby(df['Ref_No']).transform('sum'))

[out]

  Ref_No Definition  Total_to_Add  Total_B
0   ref1          B            20      240
1   ref2          A            30      100
2   ref1          B            40      240
3   ref2          A            50      100
4   ref1          B            60      240
5   ref2          B            50      100
6   ref1          B            60      240
7   ref2          B            50      100
8   ref1          B            60      240

Upvotes: 2

Related Questions