jibecat
jibecat

Reputation: 25

Pandas Return values based on a range of data with condition

I am working using python and pandas with the dataset below

Name Subject Grade
Alex Science A
Bob Maths B
Bob Maths C
Cynthia Science C
Dylan Geography A
Dylan Geography A

I would like to find names and subjects that have been repeated twice and display them exactly like the table below. The grades will only get concatenated by a comma if required.

Name Subject Grade
Bob Maths B,C
Dylan Geography A

Any help would be greatly appreciated

Upvotes: 1

Views: 45

Answers (1)

sophocles
sophocles

Reputation: 13821

You could find your duplicated rows based on Name and Subject columns using duplicated, and then use groupby with a custom join.

set is used to ensure a single Grade is returned when Grades are the same.

key_cols = ['Name','Subject']
df[df.duplicated(key_cols, keep=False)].groupby(key_cols,as_index=False).agg({'Grade':lambda x: ','.join(set(x))})

prints:

    Name    Subject Grade
0    Bob      Maths   C,B
1  Dylan  Geography     A

Upvotes: 3

Related Questions