Reputation: 471
I have a dataframe with many columns, with two columns of focus in this operation. One column that contains duplicated names and one contains timedelta64. I would like to get the row which the timedelta64 column element is the closest to zero. The below sample would illustrate the operation better. Any help would be appreciated!! Thanks in advance!!
current df
Name Number Age Days
Tom 23 64 83 days
Tom 5 623 19 days
Tom 93 14 -5 days
Tom 28 12 75 days
Bess 7 56 -235 days
Bess 12 67 94 days
Bess 22 225 43 days
Max 6 64 88 days
Max 67 14 2 days
Max 26 11 -35 days
My real df have thousands of row and columns may contain numbers and strings.
Desired df
Name Number Age Days
Tom 93 14 -5 days
Bess 22 225 43 days
Max 67 14 2 days
Upvotes: 1
Views: 31
Reputation: 323356
You can do with argsort
df = df.iloc[df['Days'].abs().argsort()].drop_duplicates('Name')
Upvotes: 2
Reputation: 59579
Soon you will be able to sort based on a function, but for now we need to create a temporary column. Take the absolute value and sort, that way the lowest values appear first which allows you to drop_duplicates
on 'Name'
df['temp'] = df['Days'].abs()
df = df.sort_values('temp').drop_duplicates('Name').drop(columns='temp')
Name Number Age Days
8 Max 67 14 2 days
2 Tom 93 14 -5 days
6 Bess 22 225 43 days
Upvotes: 2