Combine rows in pandas df as per given condition

Question

I have pandas df as shown

 Name     Subject   Score
 Rakesh   Math      65
 Mukesh   Science   76
 Bhavesh  French    87
 Rakesh   Science   88
 Rakesh   Hindi     76
 Sanjay   English   66
 Mukesh   English   98
 Mukesh   Marathi   77

I have to make another df including students who took two or more subjects and total their scores in each subjects.

Hence the resultant df will be as shown:

scotscotmcc · Accepted Answer

In pandas, there is a method explode that will take a column that contains lists and break them apart. We can do a sort of opposite of that by making list of your Subjects column. I pulled the idea here from another question.

In [1]: df =  df.groupby('Name').agg({'Subject': lambda x: x.tolist(), 'Score':'sum'})

In [2]: df

Out[2]:
                             Subject  Score
Name
Bhavesh                     [French]     87
Mukesh   [Science, English, Marathi]    251
Rakesh        [Math, Science, Hindi]    229
Sanjay                     [English]     66

We can then filter on the Subject column for any row where the list has more than one item. This method I lifted from another SO question.

In [3]: df[df['Subject'].str.len() > 1]
Out[3]:
                            Subject  Score
Name
Mukesh  [Science, English, Marathi]    251
Rakesh       [Math, Science, Hindi]    229

If you want the Subject column to be a string instead of a list, you can utulize this third other-answer from SO.

df['Subject'] = df['Subject'].apply(lambda x: ", ".join(x))

Combine rows in pandas df as per given condition

Answers (2)

Related Questions