Reputation: 233
I have a 4000 record plus pandas dataframe with records for individual events by time stamp
Timestamp Date Holiday DayOfWeek
2017-01-01 02:25:00 2017-01-01 True Monday
2017-01-01 12:25:00 2017-01-01 True Monday
2017-01-02 03:45:00 2017-01-02 False Tuesday
2017-01-02 15:55:00 2017-01-02 False Tuesday
2017-02-03 01:01:00 2017-02-03 False Thursday
2017-02-03 4:25:00 2017-02-03 False Thursday
2017-04-03 4:25:00 2017-04-03 True Monday
What I'm trying to do is compare the means of events per day by day of the week and if it was on a holiday.
So for each day of the week, compare the the average number of events per day for when that day was a holiday vs when that day was NOT a holiday.
events.groupby(['DayOfWeek', 'Holiday']).count()
Will get me the number of events for each day of the week by holiday
DayOfWeek Holiday Count
Monday True 50
False 34
Tuesday True 32
False 23
...
But I can't figure out how to combine this with the number of events per individual date
events.groupby('Date').count()
Date Count
01-01-2017 2
01-02-2017 2
01-03-2017 4
....
I want a data frame more like
DayOfWeek Holiday Mean
Monday True 4.5
False 3.23
Tuesday True 2.1
False 3.2
...
And then ideally make a bar chart from it.
But can't figure out how to combine the operations to create what I want first.
Upvotes: 0
Views: 834
Reputation: 829
If I understand correctly, what you're looking for should be
df.groupby(['Date', 'DayOfWeek', 'Holiday']).count().reset_index().groupby(['DayOfWeek', 'Holiday']).mean()['Timestamp']
First we group by date
(and DayOfWeek
and Holiday
to preserve the columns - they will always be the same for any single date), count the records per date, reset the index, group by DayOfWeek
and Holiday
and calculate the mean.
For the sample data you provided this results in
DayOfWeek Holiday
Monday True 1.5
Thursday False 2.0
Tuesday False 2.0
Upvotes: 2