groupby and sum with pandas for certain columns while including other columns also

Question

I have the following data:

   import pandas as pd
x4 = pd.DataFrame({"ID": [101,101, 102, 103, 104, 105],
                   "Prob": [1, 1,1, 1, 1, 1],
                   "Ef": [0,2, 0, 0, 0.25, 0.29],
                   "W": [2, 2,3, 4, 5, 6],
                   "EC": [0, 0,0, 0, 1.6, 2],
                   "Rand": [11, 12,12, 13, 14, 15]})

I would like get the sum(Prob * Ef) by ID and then keep only the columns ID, the column with the sum, the EC column and the W column.

So in the end I want to have this:

            ID  sum_column EC       W
1:          101 2.00       0.0      2
2:          101 2.00       0.0      2
3:          102 0.00       0.0      3
4:          103 0.00       0.0      4
5:          104 0.25       1.6      5
6:          105 0.29       2.0      6

I have tried this: x4.loc[:, ['EC','W','ID','Prob','Ef']].groupby('ID').sum(Prob*Ef)

But it does not work

jezrael · Accepted Answer

Use GroupBy.transform by multiplied columns:

x4['sum_column'] = x4['Prob'].mul(x4['Ef']).groupby(x4['ID']).transform('sum')
x4 = x4.drop(['Ef','Prob', 'Rand'], axis=1)
print (x4)
    ID  W   EC  sum_column
0  101  2  0.0        2.00
1  101  2  0.0        2.00
2  102  3  0.0        0.00
3  103  4  0.0        0.00
4  104  5  1.6        0.25
5  105  6  2.0        0.29

If order of columns is important use insert:

x4.insert(1, 'sum_column',  x4['Prob'].mul(x4['Ef']).groupby(x4['ID']).transform('sum'))
x4 = x4.drop(['Ef','Prob', 'Rand'], axis=1)
print (x4)
    ID  sum_column  W   EC
0  101        2.00  2  0.0
1  101        2.00  2  0.0
2  102        0.00  3  0.0
3  103        0.00  4  0.0
4  104        0.25  5  1.6
5  105        0.29  6  2.0

groupby and sum with pandas for certain columns while including other columns also

Answers (1)

Related Questions