Quantatia
Quantatia

Reputation: 91

Conditional Summing in Pandas

I have a 3 column dataframe named df:

Currency Product  Balance         
USD      A        10       
GBP      B        20       
USD      A        15       
GBP      A        5        
USD      B        30       
USD      A        20       
USD      B        5       

And want to calculate the weight of each unique combination of product and currency.

The following code gives me the total for each combination:

df.groupby(['Currency','Product'])['Balance'].sum().

I wish to add a 4th column where I take the Balance number and divide it by looking up this aggregated value in the groupby table for that combination:

Currency Product  Balance  Weight       
USD      A        10       28.57%
GBP      B        20       100.00%
USD      A        15       42.86%
GBP      A        5        100.00%
USD      B        30       85.71%
USD      A        20       57.14%
USD      B        5        14.29%

Pseudo-code would be:

df['Weight'] = df['Balance'] / look up value in the groupby table above

The numerator in the weight is balance and the denominator is the total for the same product in that currency, so 3 products and 4 currencies would give us 12 totals.

Please help.

Upvotes: 2

Views: 168

Answers (2)

Alexander
Alexander

Reputation: 109556

You can divide the balance by the transform of the groupby (which keeps the same dimension as the original dataframe):

df['Weight'] = df.Balance / df.groupby(['Currency', 'Product']).Balance.transform(sum)

>>> df
  Currency Product  Balance    Weight
0      USD       A       10  0.222222
1      GBP       B       20  1.000000
2      USD       A       15  0.333333
3      GBP       A        5  1.000000
4      USD       B       30  0.857143
5      USD       A       20  0.444444
6      USD       B        5  0.142857

To better understand what is happening:

df['Sum'] = df.groupby(['Currency', 'Product']).Balance.transform(sum)

>>> df
  Currency Product  Balance    Weight  Sum
0      USD       A       10  0.222222   45
1      GBP       B       20  1.000000   20
2      USD       A       15  0.333333   45
3      GBP       A        5  1.000000    5
4      USD       B       30  0.857143   35
5      USD       A       20  0.444444   45
6      USD       B        5  0.142857   35

>>> df.groupby(['Currency', 'Product']).Balance.sum()
Currency  Product
GBP       A           5
          B          20
USD       A          45
          B          35
Name: Balance, dtype: int64

Upvotes: 2

efajardo
efajardo

Reputation: 797

You could use apply:

agg = df.groupby(['Currency', 'Product']).sum()
df['weight'] = df.apply(lambda row: row['Balance']/agg.loc[row['Currency']].loc[row['Product']], axis=1)

Upvotes: 0

Related Questions