Reputation: 97
By using Pandas groupby, I have data on how much activity certain users have on average any given day of the week. Grouped by user and day, I compute max and mean for several users in the last 30 days.
Now I want to find, for every user, which day of the week corresponds to their daily max activity, and what is the average magnitude of that activity.
What is the method in pandas to perform such a task?
The original data looks something like this:
userID countActivity weekday
0 3 25 5
1 3 58 6
2 3 778 0
3 3 78208 1
4 3 6672 2
The object that has these groups is created from the following:
aggregations = {
'countActivity': {
'maxDaily': 'max',
'meanDaily': 'mean'
}
}
dailyAggs = df.groupby(['userID','weekday']).agg(aggregations)
The groupby object looks something like this:
countActivity
maxDaily meanDaily
userID weekday
3 0 84066 18275.6
1 78208 20698.5
2 172579 64930.75
3 89535 25443
4 6152 2809
Pandas groupby method filter
seems to be needed here, but I'm stumped how on how to proceed.
Upvotes: 1
Views: 1007
Reputation: 33793
I'd first do a groupby
on 'userID'
, and then write an apply
function to do the rest. The apply
function will take a 'userID'
group, perform another groupby
on 'weekday'
to do your aggregations, and then only return the row that contains the maximum value for maxDaily
, which can be found with argmax
.
def get_max_daily(grp):
aggregations = {'countActivity': {'maxDaily': 'max', 'meanDaily': 'mean'}}
grp = grp.groupby('weekday').agg(aggregations).reset_index()
return grp.loc[grp[('countActivity', 'maxDaily')].argmax()]
result = df.groupby('userID').apply(get_max_daily)
I've added a row to your sample data to make sure the daily aggregations were working correctly, since your sample data only contains one entry per weekday:
userID countActivity weekday
0 3 25 5
1 3 58 6
2 3 778 0
3 3 78208 1
4 3 6672 2
5 3 78210 1
The resulting output:
weekday countActivity
meanDaily maxDaily
userID
3 1 78209 78210
Upvotes: 5