Reputation: 4482
I have the following data:
import pandas as pd
x4 = pd.DataFrame({"ID": [101,101, 102, 103, 104, 105],
"Prob": [1, 1,1, 1, 1, 1],
"Ef": [0,2, 0, 0, 0.25, 0.29],
"W": [2, 2,3, 4, 5, 6],
"EC": [0, 0,0, 0, 1.6, 2],
"Rand": [11, 12,12, 13, 14, 15]})
I would like get the sum(Prob * Ef)
by ID
and then keep only the columns ID
, the column with the sum
, the EC
column and the W
column.
So in the end I want to have this:
ID sum_column EC W
1: 101 2.00 0.0 2
2: 101 2.00 0.0 2
3: 102 0.00 0.0 3
4: 103 0.00 0.0 4
5: 104 0.25 1.6 5
6: 105 0.29 2.0 6
I have tried this: x4.loc[:, ['EC','W','ID','Prob','Ef']].groupby('ID').sum(Prob*Ef)
But it does not work
Upvotes: 1
Views: 80
Reputation: 862406
Use GroupBy.transform
by multiplied columns:
x4['sum_column'] = x4['Prob'].mul(x4['Ef']).groupby(x4['ID']).transform('sum')
x4 = x4.drop(['Ef','Prob', 'Rand'], axis=1)
print (x4)
ID W EC sum_column
0 101 2 0.0 2.00
1 101 2 0.0 2.00
2 102 3 0.0 0.00
3 103 4 0.0 0.00
4 104 5 1.6 0.25
5 105 6 2.0 0.29
If order of columns is important use insert
:
x4.insert(1, 'sum_column', x4['Prob'].mul(x4['Ef']).groupby(x4['ID']).transform('sum'))
x4 = x4.drop(['Ef','Prob', 'Rand'], axis=1)
print (x4)
ID sum_column W EC
0 101 2.00 2 0.0
1 101 2.00 2 0.0
2 102 0.00 3 0.0
3 103 0.00 4 0.0
4 104 0.25 5 1.6
5 105 0.29 6 2.0
Upvotes: 2