Doing multiple calculation when we `groupby` in pandas

Question

As I learned so far, each time we do groupby on a dataframe in pandas we could do only one calculation, e.g. mean of columns.

How could we do multiple calculations (as we could do in SQL) in only one groupby in pandas.

For example for dataframe:

import pandas as pd
times = [21 , 34, 37, 40, 55, 65, 67, 84, 88, 90 , 91, 97, 104,105, 108]
names = ['bob', 'alice', 'bob', 'bob' , 'ali', 'alice', 'alice' , 'ali', 'moji', 'ali', 'moji', 'alice' , 'bob', 'bob', 'bob']
user_answer = [2 , 2 , 1 , 3 , 1 , 4 , 4 , 4 , 1 , 1 , 2 , 3 , 3 ,1 , 4]
correct_answer = [2 , 3 , 2 , 3 , 1 , 1 , 4 , 4 , 4 , 1 , 2 , 1 , 3 ,1 , 4]
df = pd.DataFrame({'name' : names , 'time' : times , 'user_answer'  : user_answer ,  'correct_answer'  : correct_answer})

how could I get a dataframe with three columns:

name: Each name repeated only one time (by doing groupby names)

mean_time: mean of times next to that name

score: How many time user_answer is equals to correct_answer for that user

With only one groupby code?

jezrael · Accepted Answer

Create helper column with compare values by Series.eq and aggregate by GroupBy.agg with named aggregations:

df1 = (df.assign(new = df.user_answer.eq(df.correct_answer))
         .groupby('name').agg(mean_time = ('time','mean'),
                             score=('new','sum'))
         .reset_index())
print (df1)
    name  mean_time  score
0    ali  76.333333      3
1  alice  65.750000      1
2    bob  69.166667      5
3   moji  89.500000      1

Doing multiple calculation when we `groupby` in pandas

Answers (2)

Related Questions