Reputation: 3392
I have the following DataFrame df
and I want to calculate the average hourly number of entries per day, grouped by runway
year month day hour runway
2017 12 30 10 32L
2017 12 30 11 32L
2017 12 30 11 32L
2017 12 30 11 32L
2017 12 30 11 30R
2018 12 31 10 32L
2018 12 31 10 32L
2018 12 31 11 32L
2018 12 31 11 32L
The expected result is this one:
hour avg. count per hour
10 1.5
11 3
If I group by hour
and do size
, I get the total count of entries per hour. But how can I get the average number of entries per hour?
df.groupby("hour").size()
I tried something like this, but it fails with the error:
s = df.groupby(["hour"])["month","day"].nunique()
df_arr = asma_df.groupby(["hour"]).size().reset_index()
df_arr[0]/df_arr["hour"].map(s)
UPDATE:
The indicated duplicate question is different from mine. I am asking about the average hourly count, not the total hourly count. Therefore it is not helpful.
Upvotes: 1
Views: 88
Reputation: 862481
I think need assign to new column avg
output of division
, what is Series
:
s = df.groupby(["hour"])["day"].nunique()
df_arr = df.groupby(["hour"]).size().reset_index(name='avg')
df_arr['avg'] /= df_arr["hour"].map(s)
#alternative
#df_arr = df_arr.assign(avg = df_arr['avg'] / df_arr["hour"].map(s))
print (df_arr)
hour avg
0 10 1.5
1 11 3.0
Or divide Series
and last creare DataFrame
by reset_index
:
g = df.groupby(["hour"])["day"]
df_arr = g.size().div(g.nunique()).reset_index(name='avg')
print (df_arr)
hour avg
0 10 1.5
1 11 3.0
And solution for check values for mean
:
df_arr = df.groupby(["hour"])["day"].agg(['size','nunique'])
df_arr['avg'] = df_arr['size'] / df_arr['nunique']
print (df_arr)
size nunique avg
hour
10 3 2 1.5
11 6 2 3.0
Upvotes: 2