tkr
tkr

Reputation: 11

Selecting rows based on the closest date in one column to a reference date in another column in pandas?

I have a pandas dataframe that contains a start date and a measurement date.

   Start Date  change   Individual    measured_date 
0  2004-11-23    3341        Bob       2007-07-26   
1  2006-06-29    3398        Bob       2007-07-26   
2  1997-07-21    2277       Greg       2005-04-21   
3  2000-04-11    3380      Nancy       2005-10-14   
4  2000-04-11    3380      Nancy       2007-06-28   
5  2005-03-29    3115      Nancy       2005-10-14   
6  2005-03-29    3115      Nancy       2007-06-28   
7  2005-10-15    4294      Nancy       2007-06-28   
8  2007-03-16    2163      Nancy       2007-06-28   
9  2006-02-18    2299       Jose       2009-12-23   
10 2008-11-16    1983       Jose       2009-12-23   
11 2009-04-07    2112       Jose       2009-12-23   
12 2009-11-14    2036       Jose       2009-12-23   
13 2009-11-24    2556       Jose       2009-12-23   

I want to select the rows that have the shortest time between the start and measurement time based on the column "Individual". I created a "diff" column for viewing purposes. For instance, this dataframe should be filtered to

StartDate   change  Individual measured_date     diff
6/29/2006   3398    Bob        7/26/2007         392 days 00:00:00.000000000
7/21/1997   2277    Greg       4/21/2005         2831 days 00:00:00.000000000
3/16/2007   2163    Nancy      6/28/2007         104 days 00:00:00.000000000
11/24/2009  2556    Jose       12/23/2009        29 days 00:00:00.000000000`

Upvotes: 1

Views: 186

Answers (2)

rafaelc
rafaelc

Reputation: 59264

You can groupby and use nsmallest

df.groupby("Individual").diff.nsmallest(1)

Upvotes: 0

BENY
BENY

Reputation: 323226

Using sort_values then we drop_duplicates

df.sort_values('diff',ascending=False).drop_duplicates('Individual')

Upvotes: 1

Related Questions