Ninsuke
Ninsuke

Reputation: 1

Finding the Average number of people by Hourly intervals, Panda/Numpy

I have a csv file titled "Work.csv". The time a person registers in for work for the month of October (there's many entries).

MONTH   YEAR    DATE      TIME
Oct    2011  20/10/2011  22:04:52
Oct    2011  26/10/2011  11:15:16
Oct    2011  11/10/2011  10:52:52
Oct    2011  11/10/2011  16:40:06

I wish to find the average number of people who registers in for work in every hourly interval. eg. The average number of people who registers in from 12am to 1am, 1am to 2am... and so on.

I tried my own code using dt.hour and only managed to get the frequency count for the total number of people in the specific hour.

I can't seem to get the average.

My desired output is:

Hour        Average Number of People
0                    40
1                    50
2                    65
3                    30
4                    30

Please help!

Upvotes: 0

Views: 81

Answers (2)

DYZ
DYZ

Reputation: 57105

First, you should convert your TIME column to DateTime. Then group by time and date to find out how many people worked on each date at each hour. Finally, group by the hour again and get the mean:

df.groupby([pd.to_datetime(df.TIME).dt.hour, df.DATE]).size()\ # raw counts
  .groupby('TIME').mean() # mean counts
#TIME
#10    1
#11    1
#16    1
#22    1

A more compact (and possibly less efficient) option, thanks to @Wen (who is always helpful):

df.groupby([pd.to_datetime(df.TIME).dt.hour, df.DATE]).size()\
  .mean(level='TIME') 

Upvotes: 1

krizajb
krizajb

Reputation: 1814

First your need to count all the registrations and store it in df then use mean() function for 1h interval and that should do it. Something like this:

  df_registered['count'] = df_input['month'].resample("1h").count()
  df_registered['avg'] = df_registered['count'].resample("1h").mean()

For resample to work you first need to pick an index.

Upvotes: 0

Related Questions