naina
naina

Reputation: 11

pandas group by on column values and extract one column text

I have result, student, version and status columns. in this I want to do group by using Student , Version and result = pass count and result = fail count

similarly like df.groupby(["student", "version", "result=pass"]).size().reset_index(name="new_result")

the below is my data frame

result student version status Failed Subject
pass Student-A L-1.0 Active
fail Student-A L-1.0 Active Mathematics
fail Student-A L-1.0 Active Physics
pass Student-A M-1.0 Active
fail Student-A M-1.0 Active Science
pass Student-B N-1.0 Active
pass Student-B N-1.0 Active
pass Student-B N-1.0 Active
pass Student-C O-1.0 Active
pass Student-C O-1.0 Active
fail Student-C O-1.0 Active English
fail Student-C P-1.0 Active Computers
fail Student-C P-1.0 Active Mathematics

I want my output data frame as below:

student version pass_count fail_count status total_count (pass+fail) Failed Subject
Student-A L-1.0 1 2 Active 3 Mathematics,Physics
Student-A M-1.0 1 1 Active 2 Science
Student-B N-1.0 3 0 Active 3
Student-C O-1.0 1 1 Active 2 English
Student-C P-1.0 0 2 Active 2 Computers,Mathematics

I am able to get pass and fail count using the below but not total count, anyone pls help

pd.pivot_table(master_df, index=['status', 'student', 'version'], columns=['result'], aggfunc=len, fill_value=0)

Upvotes: 1

Views: 1532

Answers (1)

SeaBean
SeaBean

Reputation: 23217

You can use .groupby() + agg(), as follows:

df_out = (df.groupby(['student', 'version', 'status'], as_index=False)
            .agg(**{'pass_count': ('result', lambda x: x[x == 'pass'].size),
                    'fail_count': ('result', lambda x: x[x == 'fail'].size),                    
                    'total_count': ('result', 'size'),
                    'Failed Subject': ('Failed Subject', lambda x: ','.join(x.dropna()))
                    })
          )

Result:

print(df_out)

     student version  status  pass_count  fail_count  total_count         Failed Subject
0  Student-A   L-1.0  Active           1           2            3    Mathematics,Physics
1  Student-A   M-1.0  Active           1           1            2                Science
2  Student-B   N-1.0  Active           3           0            3                       
3  Student-C   O-1.0  Active           2           1            3                English
4  Student-C   P-1.0  Active           0           2            2  Computers,Mathematics

Upvotes: 2

Related Questions