MD Rijwan
MD Rijwan

Reputation: 481

Find second most recent date in a dataframe column

I have data for example:

  1. Sampled_Date
  2. 8/29/2017
  3. 8/29/2017
  4. 8/29/2017
  5. 2/28/2016
  6. 2/28/2016
  7. 5/15/2014

Etc.. Now I can find max and min dates as

df.Sampled_Date.max()
df.Sampled_Date.min()

But how to find the second most recent date. i.e 2/28/2016 in Python's pandas data frame.

Upvotes: 1

Views: 6250

Answers (3)

Paul Meinshausen
Paul Meinshausen

Reputation: 789

I know this is an extension of the question, but it's something I frequently need and sometimes forget, so I'm sharing here:

Let's say instead of just wanting the second most recent or second earliest dates for an entire dataframe, you have a dataframe of users and dates, and you want to get the second earliest date for each user (e.g. their second transaction).

Example dataframe:

test = pd.DataFrame()
test['users'] = [1,2,3,2,3,2]
test['dates'] = pd.to_datetime(['2019-01-01','2019-01-01',
                                '2019-01-02','2019-01-02',
                                '2019-01-03','2019-01-04'])

The earliest date for user 2 is '2019-01-01' and the second earliest date is '20-19-01-02'. We can use groupby, apply, and nlargest/nsmallest:

test.groupby('users')['dates'].apply(lambda x: x.nsmallest(2).max())

which gives us this output:

users
1   2019-01-01
2   2019-01-02
3   2019-01-03
Name: dates, dtype: datetime64[ns]

Upvotes: 3

Deena
Deena

Reputation: 6213

You can also use .argsort()

import pandas as pd

# Generate dates
dates = pd.Series(pd.date_range(start='1/1/2017', periods=5, freq=pd.offsets.MonthEnd(3)))

# Random order
dates = dates.sample(frac=1, random_state=0)

# Get the second 'max' date
dates[dates.argsort() == (len(dates)-2)] # 3   2017-10-31

Upvotes: 1

sacuL
sacuL

Reputation: 51335

Make sure your dates are in datetime first:

df['Sampled_Date'] = pd.to_datetime(df['Sampled_Date'])

Then drop the duplicates, take the nlargest(2), and take the last value of that:

df['Sampled_Date'].drop_duplicates().nlargest(2).iloc[-1]

# Timestamp('2016-02-28 00:00:00')

Upvotes: 3

Related Questions