Reputation: 1130
I have a dataset given below:
date product_category product_type amount
2020-01-01 A 1 15
2020-01-01 A 2 25
2020-01-01 A 3 10
2020-01-02 B 1 15
2020-01-02 B 2 10
2020-01-03 C 2 100
2020-01-03 C 1 250
2020-01-03 C 3 150
I am trying to convert this data with a normalized amount based on product_category and date
given below:
date product_category product_type amount
2020-01-01 A 1 0.30
2020-01-01 A 2 0.50
2020-01-01 A 3 0.20
2020-01-02 B 1 0.60
2020-01-02 B 2 0.40
2020-01-03 C 2 0.20
2020-01-03 C 1 0.50
2020-01-03 C 3 0.30
Is there any way to do with python dataframes and updating the original panda dataframe?
Upvotes: 1
Views: 100
Reputation: 862521
Use GroupBy.transform
with sum for repeat aggregated sum
, so possible divide by original column amount
:
#to new column
df['norm'] = df['amount'].div(df.groupby(['date','product_category'])['amount'].transform('sum'))
#rewrite original column
#df['amount'] = df['amount'].div(df.groupby(['date','product_category'])['amount'].transform('sum'))
print (df)
date product_category product_type amount norm
0 2020-01-01 A 1 15 0.3
1 2020-01-01 A 2 25 0.5
2 2020-01-01 A 3 10 0.2
3 2020-01-02 B 1 15 0.6
4 2020-01-02 B 2 10 0.4
5 2020-01-03 C 2 100 0.2
6 2020-01-03 C 1 250 0.5
7 2020-01-03 C 3 150 0.3
Upvotes: 1