count distinct value in pandas dataframe

Question

I have a dataframe with 3 columns. I am trying to find distinct and aggregated values for each group in the row.

I am using a jupyter notebook for running this. Also using python libraries like numpy

For example my original dataframe 'df' is like:

    Name    Subject Grade
0   Tom     Math    20
1   Tom     Sci     30
2   Tom     Eng     10
3   Tom     Math    40
4   John    Math    30

I want to count the number of times a student gave a particular exam and the sum of grades of both the exam.

I applied the groupby query as follows:

new_df = df.groupby(['Name', 'Subject']).agg({ 'Grade': np.sum, 'Subject': np.size})

But this resulted in only 2 columns as expected. It did demonstrate the aggregate score for each subject for each student but in the form of rows.

I also tried using crosstab:

pd.crosstab(df.Name,df.Subject)

But I am not sure how can I incorporate the aggregation function of Sum.

My expected output is as follows:

    Name    Eng Math    Sci GradeEng    GradeMath   GradeSci
0   John    0   30      0   0           1           0
1   Tom     10  60      30  1           2           1

Vaishali · Accepted Answer

Its more of a pivot problem with two aggregate functions,

new_df = df.pivot_table(index = 'Name', columns = 'Subject', values = 'Grade', aggfunc = ['sum', 'size'], fill_value=0)\
.rename(columns = {'sum':'', 'size':'Grade'})

new_df.columns = new_df.columns.map(''.join)

new_df.reset_index(inplace = True)

    Name    Eng Math    Sci GradeEng    GradeMath   GradeSci
0   John    0   30      0   0           1           0
1   Tom     10  60      30  1           2           1

You can get the same result using groupby, you need to unstack the dataframe

df.groupby(['Name', 'Subject']).agg({ 'Grade': np.sum, 'Subject': np.size}).unstack().fillna(0).astype(int)

count distinct value in pandas dataframe

Answers (2)

Related Questions