Reputation:
I want to group by my dataframe by different columns based on UserId,Date,category (frequency of use per day ) ,max duration per category ,and the part of the day when it is most used and finally store the result in a .csv file.
name duration UserId category part_of_day Date
Settings 3.436 1 System tool evening 2020-09-10
Calendar 2.167 1 Calendar night 2020-09-11
Calendar 5.705 1 Calendar night 2020-09-11
Messages 7.907 1 Phone_and_SMS night 2020-09-11
Instagram 50.285 9 Social night 2020-09-28
Drive 30.260 9 Productivity night 2020-09-28
df.groupby(["UserId", "Date","category"])["category"].count()
my code result is :
UserId Date category
1 2020-09-10 System tool 1
2020-09-11 Calendar 8
Clock 2
Communication 86
Health & Fitness 5
But i want this result
UserId Date category count(category) max-duration
1 2020-09-10 System tool 1 3
2020-09-11 Calendar 2 5
2 2020-09-28 Social 1 50
Productivity 1 30
How can I do that? I can not find the wanted result for any solution
Upvotes: 0
Views: 177
Reputation: 2500
You take advantage of pandas.DataFrame.groupby , pandas.DataFrame.aggregate and pandas.DataFrame.rename in following format to generate your desired output in one line:
import pandas as pd
df = pd.DataFrame({'name': ['Settings','Calendar','Calendar', 'Messages', 'Instagram', 'Drive'],
'duration': [3.436, 2.167, 5.7050, 7.907, 50.285, 30.260],
'UserId': [1, 1, 1, 1, 2, 2],
'category' : ['System_tool', 'Calendar', 'Calendar', 'Phone_and_SMS', 'Social', 'Productivity'],
'part_of_day' : ['evening', 'night','night','night','night','night' ],
'Date' : ['2020-09-10', '2020-09-11', '2020-09-11', '2020-09-11', '2020-09-28', '2020-09-28'] })
df.groupby(['UserId', 'Date', 'category']).aggregate( count_cat = ('category', 'count'), max_duration = ('duration', 'max'))
Upvotes: 0
Reputation: 1501
It looks like you might be wanting to calculate statistics for each group.
grouped = df.groupby(["UserId", "Date","category"])
result = grouped.agg({'category': 'count', 'duration': 'max'})
result.columns = ['group_count','duration_max']
result = result.reset_index()
result
UserId Date category group_count duration_max
0 1 2020-09-10 System tool 1 3.436
1 1 2020-09-11 Calendar 2 5.705
2 1 2020-09-11 Phone_and_SMS 1 7.907
3 9 2020-09-28 Productivity 1 30.260
4 9 2020-09-28 Social 1 50.285
Upvotes: 1
Reputation: 76297
From your question, it looks like you'd like to make a table with each combination and the count. For this, you might consider using the as_index
parameter in groupby
:
df.category.groupby(["UserId", "Date"], as_index=False).count()
Upvotes: 1