Reputation: 581
I have pandas df as shown
Name Subject Score
Rakesh Math 65
Mukesh Science 76
Bhavesh French 87
Rakesh Science 88
Rakesh Hindi 76
Sanjay English 66
Mukesh English 98
Mukesh Marathi 77
I have to make another df including students who took two or more subjects and total their scores in each subjects.
Hence the resultant df will be as shown:
Upvotes: 0
Views: 56
Reputation: 8229
Using groupby
, filter
and agg
we can do it in one line:
(df.groupby('Name')
.filter(lambda g:len(g)>1)
.groupby('Name')
.agg({'Subject': ', '.join, 'Score':'sum'})
)
output
Subject Score
Name
Mukesh Science, English, Marathi 251
Rakesh Math, Science, Hindi 229
Upvotes: 1
Reputation: 3133
In pandas, there is a method explode
that will take a column that contains lists and break them apart. We can do a sort of opposite of that by making list of your Subjects column. I pulled the idea here from another question.
In [1]: df = df.groupby('Name').agg({'Subject': lambda x: x.tolist(), 'Score':'sum'})
In [2]: df
Out[2]:
Subject Score
Name
Bhavesh [French] 87
Mukesh [Science, English, Marathi] 251
Rakesh [Math, Science, Hindi] 229
Sanjay [English] 66
We can then filter on the Subject
column for any row where the list has more than one item. This method I lifted from another SO question.
In [3]: df[df['Subject'].str.len() > 1]
Out[3]:
Subject Score
Name
Mukesh [Science, English, Marathi] 251
Rakesh [Math, Science, Hindi] 229
If you want the Subject
column to be a string instead of a list, you can utulize this third other-answer from SO.
df['Subject'] = df['Subject'].apply(lambda x: ", ".join(x))
Upvotes: 2