Reputation: 388
This code looks really stupid but this is a basic representation of the problem I've been dealing with all day - I have 3 columns, type, day and month. I'd like to count the number of dogs/cats by day, and then average it out over the month.
import numpy as np
import pandas as pd
data = {'Type':['Dog', 'Cat', 'Cat', 'Cat', 'Dog', 'Dog', 'Dog', 'Cat'], 'Day':[1, 1, 2, 2, 3, 3, 4, 4], 'Month': [1, 1, 1, 1, 2, 2, 2, 2]}
newDF = pd.DataFrame(data)
Which creates a dataframe that looks like this:
Type|Day|Month
---------
Dog|1|1
Cat|1|1
Cat|2|1
Cat|2|1
Dog|3|2
Dog|3|2
Dog|4|2
Cat|4|2
What I'm trying to do here is create a table below showing this:
Type | Month1 | Month2
------------------------
Dog | 1 | 1.5
Cat | 1.5 | 1
So basically, I just want to use some combination of pivot table or groupby to create a pivot_table containing the count of number of cats / dogs per day, and then average that out over the month. For some reason, I just can't manage to figure it out. Can someone smart enough with pandas please help? Thank you!
Upvotes: 2
Views: 51
Reputation: 5460
Just a groupby
combined with an unstack
and mean
:
df.groupby(df.columns.tolist()) \
.size() \
.unstack(level='Day') \
.mean(axis=1) \
.unstack(level='Month')
Output:
Month 1 2
Type
Cat 1.5 1.0
Dog 1.0 1.5
Upvotes: 2
Reputation: 59549
Two groupbys
+ unstack
(newDF.groupby(['Type', 'Day', 'Month']).size()
.groupby(level=[0,2]).mean()
.unstack()
.add_prefix('Month').rename_axis(None, 1))
Output:
Month1 Month2
Type
Cat 1.5 1.0
Dog 1.0 1.5
Upvotes: 2