Reputation: 1
I have a csv file titled "Work.csv". The time a person registers in for work for the month of October (there's many entries).
MONTH YEAR DATE TIME
Oct 2011 20/10/2011 22:04:52
Oct 2011 26/10/2011 11:15:16
Oct 2011 11/10/2011 10:52:52
Oct 2011 11/10/2011 16:40:06
I wish to find the average number of people who registers in for work in every hourly interval. eg. The average number of people who registers in from 12am to 1am, 1am to 2am... and so on.
I tried my own code using dt.hour and only managed to get the frequency count for the total number of people in the specific hour.
I can't seem to get the average.
My desired output is:
Hour Average Number of People
0 40
1 50
2 65
3 30
4 30
Please help!
Upvotes: 0
Views: 81
Reputation: 57105
First, you should convert your TIME column to DateTime. Then group by time and date to find out how many people worked on each date at each hour. Finally, group by the hour again and get the mean:
df.groupby([pd.to_datetime(df.TIME).dt.hour, df.DATE]).size()\ # raw counts
.groupby('TIME').mean() # mean counts
#TIME
#10 1
#11 1
#16 1
#22 1
A more compact (and possibly less efficient) option, thanks to @Wen (who is always helpful):
df.groupby([pd.to_datetime(df.TIME).dt.hour, df.DATE]).size()\
.mean(level='TIME')
Upvotes: 1
Reputation: 1814
First your need to count all the registrations and store it in df then use mean() function for 1h interval and that should do it. Something like this:
df_registered['count'] = df_input['month'].resample("1h").count()
df_registered['avg'] = df_registered['count'].resample("1h").mean()
For resample to work you first need to pick an index.
Upvotes: 0