Jia Gao
Jia Gao

Reputation: 1292

python operations on multiple columns after group

I have a dataframe like below:

df = {'col_1': [1,2,3,4,5,6,7,8,9,10],
      'col_2': [1,1,1,0,0,1,0,0,1,1],
      'col_3':['A','A','A','A','A','B','B','B','B','B']}
df = pd.DataFrame(df)

I want something like:

df['col_4'] =  df.groupby(['col_3'])['col_1','col_2'].transform((col_1*col_2)/sum(col_2))

which is the the multiplication of col_1 and col_2 then divided by sum of col_2 in each group, defined by col_3. The expected result should be:

df
   col_1  col_2 col_3  col_4
0      1      1     A   2.00
1      2      1     A   2.00
2      3      1     A   2.00
3      4      0     A   2.00
4      5      0     A   2.00
5      6      1     B   8.33
6      7      0     B   8.33
7      8      0     B   8.33
8      9      1     B   8.33
9     10      1     B   8.33

Anyone know how to get there?

Upvotes: 1

Views: 39

Answers (1)

jpp
jpp

Reputation: 164683

Assuming you wish to calculatd the sum of col_1 * col_2 groupwise, you can define a GroupBy object and use transform + sum twice:

g = df.eval('prod=col_1*col_2').groupby('col_3')
df['col_4'] = g['prod'].transform('sum') / g['col_2'].transform('sum')

print(df)

   col_1  col_2 col_3     col_4
0      1      1     A  2.000000
1      2      1     A  2.000000
2      3      1     A  2.000000
3      4      0     A  2.000000
4      5      0     A  2.000000
5      6      1     B  8.333333
6      7      0     B  8.333333
7      8      0     B  8.333333
8      9      1     B  8.333333
9     10      1     B  8.333333

Upvotes: 3

Related Questions