Reputation: 355
I have a dataframe with 3 columns. I am trying to find distinct and aggregated values for each group in the row.
I am using a jupyter notebook for running this. Also using python libraries like numpy
For example my original dataframe 'df' is like:
Name Subject Grade
0 Tom Math 20
1 Tom Sci 30
2 Tom Eng 10
3 Tom Math 40
4 John Math 30
I want to count the number of times a student gave a particular exam and the sum of grades of both the exam.
I applied the groupby query as follows:
new_df = df.groupby(['Name', 'Subject']).agg({ 'Grade': np.sum, 'Subject': np.size})
But this resulted in only 2 columns as expected. It did demonstrate the aggregate score for each subject for each student but in the form of rows.
I also tried using crosstab:
pd.crosstab(df.Name,df.Subject)
But I am not sure how can I incorporate the aggregation function of Sum.
My expected output is as follows:
Name Eng Math Sci GradeEng GradeMath GradeSci
0 John 0 30 0 0 1 0
1 Tom 10 60 30 1 2 1
Upvotes: 2
Views: 711
Reputation: 38415
Its more of a pivot problem with two aggregate functions,
new_df = df.pivot_table(index = 'Name', columns = 'Subject', values = 'Grade', aggfunc = ['sum', 'size'], fill_value=0)\
.rename(columns = {'sum':'', 'size':'Grade'})
new_df.columns = new_df.columns.map(''.join)
new_df.reset_index(inplace = True)
Name Eng Math Sci GradeEng GradeMath GradeSci
0 John 0 30 0 0 1 0
1 Tom 10 60 30 1 2 1
You can get the same result using groupby, you need to unstack the dataframe
df.groupby(['Name', 'Subject']).agg({ 'Grade': np.sum, 'Subject': np.size}).unstack().fillna(0).astype(int)
Upvotes: 3
Reputation: 323326
Just modify you crosstab
, you can achieve what you need
s=pd.crosstab(df.Name,df.Subject,values=df.Grade,aggfunc=['sum','count']).swaplevel(0,1,axis=1).fillna(0)
s.columns=s.columns.map(''.join)
s
Engsum Mathsum Scisum Engcount Mathcount Scicount
Name
John 0.0 30.0 0.0 0.0 1.0 0.0
Tom 10.0 60.0 30.0 1.0 2.0 1.0
Upvotes: 2