user3104352
user3104352

Reputation: 1130

how to calculate the percentage in a group of columns in pandas dataframe while keeping the original format of data

I have a dataset given below:

   date        product_category   product_type   amount 
2020-01-01             A               1           15
2020-01-01             A               2           25
2020-01-01             A               3           10
2020-01-02             B               1           15
2020-01-02             B               2           10
2020-01-03             C               2           100
2020-01-03             C               1           250
2020-01-03             C               3           150

I am trying to convert this data with a normalized amount based on product_category and date given below:

   date        product_category   product_type   amount 
2020-01-01             A               1           0.30
2020-01-01             A               2           0.50
2020-01-01             A               3           0.20
2020-01-02             B               1           0.60
2020-01-02             B               2           0.40
2020-01-03             C               2           0.20
2020-01-03             C               1           0.50
2020-01-03             C               3           0.30

Is there any way to do with python dataframes and updating the original panda dataframe?

Upvotes: 1

Views: 100

Answers (1)

jezrael
jezrael

Reputation: 862521

Use GroupBy.transform with sum for repeat aggregated sum, so possible divide by original column amount:

#to new column
df['norm'] = df['amount'].div(df.groupby(['date','product_category'])['amount'].transform('sum'))

#rewrite original column
#df['amount'] = df['amount'].div(df.groupby(['date','product_category'])['amount'].transform('sum'))

print (df)
         date product_category  product_type  amount  norm
0  2020-01-01                A             1      15   0.3
1  2020-01-01                A             2      25   0.5
2  2020-01-01                A             3      10   0.2
3  2020-01-02                B             1      15   0.6
4  2020-01-02                B             2      10   0.4
5  2020-01-03                C             2     100   0.2
6  2020-01-03                C             1     250   0.5
7  2020-01-03                C             3     150   0.3

Upvotes: 1

Related Questions