Reputation: 91
I have a 3 column dataframe named df
:
Currency Product Balance
USD A 10
GBP B 20
USD A 15
GBP A 5
USD B 30
USD A 20
USD B 5
And want to calculate the weight of each unique combination of product and currency.
The following code gives me the total for each combination:
df.groupby(['Currency','Product'])['Balance'].sum().
I wish to add a 4th column where I take the Balance number and divide it by looking up this aggregated value in the groupby table for that combination:
Currency Product Balance Weight
USD A 10 28.57%
GBP B 20 100.00%
USD A 15 42.86%
GBP A 5 100.00%
USD B 30 85.71%
USD A 20 57.14%
USD B 5 14.29%
Pseudo-code would be:
df['Weight'] = df['Balance'] / look up value in the groupby table above
The numerator in the weight is balance and the denominator is the total for the same product in that currency, so 3 products and 4 currencies would give us 12 totals.
Please help.
Upvotes: 2
Views: 168
Reputation: 109556
You can divide the balance by the transform
of the groupby
(which keeps the same dimension as the original dataframe):
df['Weight'] = df.Balance / df.groupby(['Currency', 'Product']).Balance.transform(sum)
>>> df
Currency Product Balance Weight
0 USD A 10 0.222222
1 GBP B 20 1.000000
2 USD A 15 0.333333
3 GBP A 5 1.000000
4 USD B 30 0.857143
5 USD A 20 0.444444
6 USD B 5 0.142857
To better understand what is happening:
df['Sum'] = df.groupby(['Currency', 'Product']).Balance.transform(sum)
>>> df
Currency Product Balance Weight Sum
0 USD A 10 0.222222 45
1 GBP B 20 1.000000 20
2 USD A 15 0.333333 45
3 GBP A 5 1.000000 5
4 USD B 30 0.857143 35
5 USD A 20 0.444444 45
6 USD B 5 0.142857 35
>>> df.groupby(['Currency', 'Product']).Balance.sum()
Currency Product
GBP A 5
B 20
USD A 45
B 35
Name: Balance, dtype: int64
Upvotes: 2
Reputation: 797
You could use apply:
agg = df.groupby(['Currency', 'Product']).sum()
df['weight'] = df.apply(lambda row: row['Balance']/agg.loc[row['Currency']].loc[row['Product']], axis=1)
Upvotes: 0