Reputation: 3392
I have the following DataFrame df
and I want to calculate the average number of entries per hour over the year, grouped by runway
year month day hour runway
2017 12 30 10 32L
2017 12 30 11 32L
2017 12 30 11 32L
2017 12 30 11 32L
2017 12 30 11 30R
2018 12 30 10 32L
2018 12 30 10 32L
2018 12 30 11 32L
2018 12 30 11 32L
The expected result is this one:
year runway avg. count per hour
2017 32L 2
2017 30R 0.5
2018 32L 2
2018 32L 0
I tried this, but it does not calculate the average count per hour:
result = df.groupby(['year','runway']).count()
Upvotes: 1
Views: 524
Reputation: 30605
Here's one way of achieving it i.e
#Take the count of unique hours per year
s = df.groupby(['year'])['hour'].nunique()
# Take the count of the the runway
n = df.groupby(['year','runway']).size().reset_index()
# Divide them
n['avg'] = n[0]/n['year'].map(s)
year runway 0 avg
0 2017 30R 1 0.5
1 2017 32L 4 2.0
2 2018 32L 4 2.0
Upvotes: 3