Reputation: 381
As I learned so far, each time we do groupby
on a dataframe in pandas we could do only one calculation, e.g. mean of columns.
How could we do multiple calculations (as we could do in SQL) in only one groupby
in pandas.
For example for dataframe:
import pandas as pd
times = [21 , 34, 37, 40, 55, 65, 67, 84, 88, 90 , 91, 97, 104,105, 108]
names = ['bob', 'alice', 'bob', 'bob' , 'ali', 'alice', 'alice' , 'ali', 'moji', 'ali', 'moji', 'alice' , 'bob', 'bob', 'bob']
user_answer = [2 , 2 , 1 , 3 , 1 , 4 , 4 , 4 , 1 , 1 , 2 , 3 , 3 ,1 , 4]
correct_answer = [2 , 3 , 2 , 3 , 1 , 1 , 4 , 4 , 4 , 1 , 2 , 1 , 3 ,1 , 4]
df = pd.DataFrame({'name' : names , 'time' : times , 'user_answer' : user_answer , 'correct_answer' : correct_answer})
how could I get a dataframe with three columns:
name: Each name repeated only one time (by doing groupby
names)
mean_time: mean
of times next to that name
score: How many time user_answer
is equals to correct_answer
for that user
With only one groupby
code?
Upvotes: 1
Views: 1175
Reputation: 1669
Something like this:
def f(gdf):
res = pd.Series()
res['mean_time'] = gdf['times'].mean()
res['score'] = len(gdf[gdf['user_answer']==gdf['correct_answer']])
return res
df.groupby(['names']).apply(f)
Upvotes: 1
Reputation: 862751
Create helper column with compare values by Series.eq
and aggregate by GroupBy.agg
with named aggregations:
df1 = (df.assign(new = df.user_answer.eq(df.correct_answer))
.groupby('name').agg(mean_time = ('time','mean'),
score=('new','sum'))
.reset_index())
print (df1)
name mean_time score
0 ali 76.333333 3
1 alice 65.750000 1
2 bob 69.166667 5
3 moji 89.500000 1
Upvotes: 2