crackernutter
crackernutter

Reputation: 233

Pandas - groupby multiple columns and the compare averages of counts

I have a 4000 record plus pandas dataframe with records for individual events by time stamp

Timestamp            Date        Holiday  DayOfWeek
2017-01-01 02:25:00  2017-01-01  True      Monday
2017-01-01 12:25:00  2017-01-01  True      Monday
2017-01-02 03:45:00  2017-01-02  False     Tuesday
2017-01-02 15:55:00  2017-01-02  False     Tuesday
2017-02-03 01:01:00  2017-02-03  False     Thursday
2017-02-03 4:25:00   2017-02-03  False     Thursday
2017-04-03 4:25:00   2017-04-03  True      Monday

What I'm trying to do is compare the means of events per day by day of the week and if it was on a holiday.
So for each day of the week, compare the the average number of events per day for when that day was a holiday vs when that day was NOT a holiday.

events.groupby(['DayOfWeek', 'Holiday']).count()

Will get me the number of events for each day of the week by holiday

DayOfWeek Holiday  Count
Monday    True     50
          False    34
Tuesday   True     32
          False    23
...

But I can't figure out how to combine this with the number of events per individual date

events.groupby('Date').count()
Date        Count
01-01-2017  2
01-02-2017  2
01-03-2017  4
....

I want a data frame more like

DayOfWeek Holiday  Mean
Monday    True     4.5
          False    3.23
Tuesday   True     2.1
          False    3.2
...

And then ideally make a bar chart from it.

But can't figure out how to combine the operations to create what I want first.

Upvotes: 0

Views: 834

Answers (1)

andersource
andersource

Reputation: 829

If I understand correctly, what you're looking for should be

df.groupby(['Date', 'DayOfWeek', 'Holiday']).count().reset_index().groupby(['DayOfWeek', 'Holiday']).mean()['Timestamp']

First we group by date (and DayOfWeek and Holiday to preserve the columns - they will always be the same for any single date), count the records per date, reset the index, group by DayOfWeek and Holiday and calculate the mean.

For the sample data you provided this results in

DayOfWeek  Holiday
Monday     True       1.5
Thursday   False      2.0
Tuesday    False      2.0

Upvotes: 2

Related Questions