Reputation: 481
I have data for example:
Etc.. Now I can find max and min dates as
df.Sampled_Date.max()
df.Sampled_Date.min()
But how to find the second most recent date. i.e 2/28/2016 in Python's pandas data frame.
Upvotes: 1
Views: 6250
Reputation: 789
I know this is an extension of the question, but it's something I frequently need and sometimes forget, so I'm sharing here:
Let's say instead of just wanting the second most recent or second earliest dates for an entire dataframe, you have a dataframe of users and dates, and you want to get the second earliest date for each user (e.g. their second transaction).
Example dataframe:
test = pd.DataFrame()
test['users'] = [1,2,3,2,3,2]
test['dates'] = pd.to_datetime(['2019-01-01','2019-01-01',
'2019-01-02','2019-01-02',
'2019-01-03','2019-01-04'])
The earliest date for user 2 is '2019-01-01' and the second earliest date is '20-19-01-02'. We can use groupby, apply, and nlargest/nsmallest:
test.groupby('users')['dates'].apply(lambda x: x.nsmallest(2).max())
which gives us this output:
users
1 2019-01-01
2 2019-01-02
3 2019-01-03
Name: dates, dtype: datetime64[ns]
Upvotes: 3
Reputation: 6213
You can also use .argsort()
import pandas as pd
# Generate dates
dates = pd.Series(pd.date_range(start='1/1/2017', periods=5, freq=pd.offsets.MonthEnd(3)))
# Random order
dates = dates.sample(frac=1, random_state=0)
# Get the second 'max' date
dates[dates.argsort() == (len(dates)-2)] # 3 2017-10-31
Upvotes: 1
Reputation: 51335
Make sure your dates are in datetime first:
df['Sampled_Date'] = pd.to_datetime(df['Sampled_Date'])
Then drop the duplicates, take the nlargest(2)
, and take the last value of that:
df['Sampled_Date'].drop_duplicates().nlargest(2).iloc[-1]
# Timestamp('2016-02-28 00:00:00')
Upvotes: 3