Matthias Gallagher
Matthias Gallagher

Reputation: 471

Keep row which the element in specific column (in timedelta64) is closest to zero

I have a dataframe with many columns, with two columns of focus in this operation. One column that contains duplicated names and one contains timedelta64. I would like to get the row which the timedelta64 column element is the closest to zero. The below sample would illustrate the operation better. Any help would be appreciated!! Thanks in advance!!

current df

Name     Number     Age     Days
Tom       23         64     83 days
Tom       5          623    19 days
Tom       93         14     -5 days
Tom       28         12     75 days
Bess      7          56     -235 days
Bess      12         67     94 days
Bess      22         225    43 days
Max       6          64     88 days
Max       67         14     2  days
Max       26         11     -35 days

My real df have thousands of row and columns may contain numbers and strings.

Desired df

Name     Number     Age     Days
Tom       93         14     -5 days
Bess      22         225    43 days
Max       67         14     2  days

Upvotes: 1

Views: 31

Answers (2)

BENY
BENY

Reputation: 323356

You can do with argsort

df = df.iloc[df['Days'].abs().argsort()].drop_duplicates('Name')

Upvotes: 2

ALollz
ALollz

Reputation: 59579

Soon you will be able to sort based on a function, but for now we need to create a temporary column. Take the absolute value and sort, that way the lowest values appear first which allows you to drop_duplicates on 'Name'

df['temp'] = df['Days'].abs()

df = df.sort_values('temp').drop_duplicates('Name').drop(columns='temp')

   Name  Number  Age    Days
8   Max      67   14  2 days
2   Tom      93   14 -5 days
6  Bess      22  225 43 days

Upvotes: 2

Related Questions