Reputation: 137
I'm trying to filter and sort a Pandas dataframe to clean my data. I've looked on StackOverflow and can't seem to find a method that will give me the sort and filter I need. The data I'm working with looks something like this:
| Name 1 | Name 2 | Score |
| ------ | ------ | ----- |
| Amy | Jack | 2.456 |
| Amy | Jack | 3.234 |
| Amy | Jack | 5.124 |
| ... | ... | ... |
| Max | Jane | 8.569 |
| Max | Jane | 4.654 |
| Max | Jane | 6.349 |
What I want to do make a new dataframe out of the lowest score of every pair of names. So the resulting dataframe would be something like this:
| Name 1 | Name 2 | Score |
| ------ | ------ | ----- |
| Amy | Jack | 2.456 |
| ... | ... | ...|
| Max | Jane | 4.654 |
Upvotes: 1
Views: 1393
Reputation: 24324
You can also use sort_values()
and groupby()
method:
df.sort_values(by='Score').groupby(['Name 1', 'Name 2'], as_index = False).first()
OR
Use sort_values()
and drop_duplicates()
method:
df.sort_values(by='Score').drop_duplicates(subset=['Name 1', 'Name 2'])
Upvotes: 3
Reputation: 2128
Use:
df = df.groupby(['Name 1', 'Name 2'], as_index = False).agg(Score = ('Score', 'min'))
Output:
>>> df
Name1 Name2 Score
0 Amy Jack 2.456
1 Max Jane 4.654
Upvotes: 3