moshtaba
moshtaba

Reputation: 381

Doing multiple calculation when we `groupby` in pandas

As I learned so far, each time we do groupby on a dataframe in pandas we could do only one calculation, e.g. mean of columns.

How could we do multiple calculations (as we could do in SQL) in only one groupby in pandas.

For example for dataframe:

import pandas as pd
times = [21 , 34, 37, 40, 55, 65, 67, 84, 88, 90 , 91, 97, 104,105, 108]
names = ['bob', 'alice', 'bob', 'bob' , 'ali', 'alice', 'alice' , 'ali', 'moji', 'ali', 'moji', 'alice' , 'bob', 'bob', 'bob']
user_answer = [2 , 2 , 1 , 3 , 1 , 4 , 4 , 4 , 1 , 1 , 2 , 3 , 3 ,1 , 4]
correct_answer = [2 , 3 , 2 , 3 , 1 , 1 , 4 , 4 , 4 , 1 , 2 , 1 , 3 ,1 , 4]
df = pd.DataFrame({'name' : names , 'time' : times , 'user_answer'  : user_answer ,  'correct_answer'  : correct_answer})

how could I get a dataframe with three columns:

name: Each name repeated only one time (by doing groupby names)

mean_time: mean of times next to that name

score: How many time user_answer is equals to correct_answer for that user

With only one groupby code?

Upvotes: 1

Views: 1175

Answers (2)

Serial Lazer
Serial Lazer

Reputation: 1669

Something like this:

def f(gdf):
  res = pd.Series()
  res['mean_time'] = gdf['times'].mean()
  res['score'] = len(gdf[gdf['user_answer']==gdf['correct_answer']])
  return res

df.groupby(['names']).apply(f)

Upvotes: 1

jezrael
jezrael

Reputation: 862751

Create helper column with compare values by Series.eq and aggregate by GroupBy.agg with named aggregations:

df1 = (df.assign(new = df.user_answer.eq(df.correct_answer))
         .groupby('name').agg(mean_time = ('time','mean'),
                             score=('new','sum'))
         .reset_index())
print (df1)
    name  mean_time  score
0    ali  76.333333      3
1  alice  65.750000      1
2    bob  69.166667      5
3   moji  89.500000      1

Upvotes: 2

Related Questions