Reputation: 515
I am trying to do pandas groupby and aggregate. I have sum of two columns and I also need the division of sum of two columns.
tdf = pd.DataFrame(
[
{
'app' : 'A',
'num_requests' : 100,
'sum_resp_time' : 50000
},
{
'app' : 'A',
'num_requests' : 50,
'sum_resp_time' : 40000
},
{
'app' : 'A',
'num_requests' : 70,
'sum_resp_time' : 8000
},
{
'app' : 'B',
'num_requests' : 100,
'sum_resp_time' : 3000
},
{
'app' : 'B',
'num_requests' : 10,
'sum_resp_time' : 2000
},
{
'app' : 'C',
'num_requests' : 1000,
'sum_resp_time' : 500000
},
{
'app' : 'C',
'num_requests' : 120,
'sum_resp_time' : 52000
}
]
)
(
tdf
.groupby
(
['app']
)
.agg
(
{
'num_requests' : ['sum'],
'sum_resp_time' : ['sum'],
#'average_resp_time' : lambda x : x['sum_resp_time'].sum() / x['num_requests'].sum()
}
)
)
How can I get the 'average_resp_time' working , currently it is commented out. My current average calculation throws error.
Upvotes: 0
Views: 1160
Reputation: 2472
I think you have missed out axis = 1.
df2['average_resp_time' ] = df2.apply(lambda x : x['sum_resp_time'].sum() / x['num_requests'], axis =1)
Upvotes: 0
Reputation: 2811
This is your groupby
grouped = tdf.groupby(['app']).agg({'num_requests':'sum', 'sum_resp_time':'sum'})
To create average_resp_time
column, just do:
grouped['average_resp_time'] = grouped['sum_resp_time'] / grouped['num_requests']
num_requests sum_resp_time average_resp_time
app
A 220 98000 445.454545
B 110 5000 45.454545
C 1120 552000 492.857143
Upvotes: 1