Reputation:

how to group by multiple columns

I want to group by my dataframe by different columns based on UserId,Date,category (frequency of use per day ) ,max duration per category ,and the part of the day when it is most used and finally store the result in a .csv file.

 name     duration  UserId  category    part_of_day    Date 
 Settings   3.436    1    System tool      evening   2020-09-10 
 Calendar   2.167    1    Calendar         night     2020-09-11 
 Calendar   5.705    1    Calendar         night     2020-09-11 
 Messages   7.907    1   Phone_and_SMS     night     2020-09-11 
Instagram   50.285   9   Social            night    2020-09-28  
Drive       30.260   9  Productivity       night    2020-09-28   

df.groupby(["UserId", "Date","category"])["category"].count()

my code result is :

  UserId      Date        category               
    1       2020-09-10    System tool                  1
           2020-09-11     Calendar                     8
                         Clock                         2
                        Communication                  86
                      Health & Fitness                 5

But i want this result

  UserId      Date        category             count(category)  max-duration 
    1       2020-09-10  System tool                  1            3

            2020-09-11  Calendar                     2            5

    2       2020-09-28    Social                     1            50
                          Productivity               1            30

How can I do that? I can not find the wanted result for any solution

Upvotes: 0

Answers (3)

n1tk

Reputation: 2500

You take advantage of pandas.DataFrame.groupby , pandas.DataFrame.aggregate and pandas.DataFrame.rename in following format to generate your desired output in one line:

code:

import pandas as pd

df = pd.DataFrame({'name': ['Settings','Calendar','Calendar', 'Messages', 'Instagram', 'Drive'],
                   'duration': [3.436, 2.167, 5.7050, 7.907, 50.285, 30.260],
                   'UserId': [1, 1, 1, 1, 2, 2],
                   'category' : ['System_tool', 'Calendar', 'Calendar', 'Phone_and_SMS', 'Social', 'Productivity'],
                   'part_of_day' : ['evening', 'night','night','night','night','night' ],
                   'Date' : ['2020-09-10', '2020-09-11', '2020-09-11', '2020-09-11', '2020-09-28', '2020-09-28'] })

df.groupby(['UserId', 'Date', 'category']).aggregate( count_cat = ('category', 'count'), max_duration = ('duration', 'max'))

out:

Upvotes: 0

Jason Cook

Reputation: 1501

It looks like you might be wanting to calculate statistics for each group.

grouped = df.groupby(["UserId", "Date","category"])

result = grouped.agg({'category': 'count', 'duration': 'max'})
result.columns = ['group_count','duration_max']

result = result.reset_index()

result
   UserId        Date       category  group_count  duration_max
0       1  2020-09-10    System tool            1         3.436
1       1  2020-09-11       Calendar            2         5.705
2       1  2020-09-11  Phone_and_SMS            1         7.907
3       9  2020-09-28   Productivity            1        30.260
4       9  2020-09-28         Social            1        50.285

Upvotes: 1

Ami Tavory

Reputation: 76297

From your question, it looks like you'd like to make a table with each combination and the count. For this, you might consider using the as_index parameter in groupby:

df.category.groupby(["UserId", "Date"], as_index=False).count()

Upvotes: 1

how to group by multiple columns

Answers (3)

code:

out:

Related Questions