Occhima
Occhima

Reputation: 253

Finding the closest date inside a Pandas dataframe given a condition

I have this S&P 500 historical data sample, and I want to compare the dates inside of him.

>> df

                  High         Low        Open       Close       Volume   Adj Close
Date
2011-01-03  127.599998  125.699997  126.709999  127.050003  138725200.0  104.119293
2011-01-04  127.370003  126.190002  127.330002  126.980003  137409700.0  104.061905
2011-01-05  127.720001  126.459999  126.580002  127.639999  133975300.0  104.602806
2011-01-06  127.830002  127.010002  127.690002  127.389999  122519000.0  104.397934
2011-01-07  127.769997  126.150002  127.559998  127.139999  156034600.0  104.193031
...                ...         ...         ...         ...          ...         ...
2020-12-14  369.799988  364.470001  368.640015  364.660004   69216200.0  363.112183
2020-12-15  369.589996  365.920013  367.399994  369.589996   64071100.0  368.021240
2020-12-16  371.160004  368.869995  369.820007  370.170013   58420500.0  368.598816
2020-12-17  372.459991  371.049988  371.940002  372.239990   64119500.0  370.660004
2020-12-18  371.149994  367.019989  370.970001  369.179993  135359900.0  369.179993

let latest be the most recent SnP OHLC prices


latest = df.iloc[-1]

How can I find the date inside this dataframe index that is the closest to latest lagged by 1 year (latest.replace(year=latest.year-1)? Just using the pd.Timestamp.replace method sometimes doesn't work, it can generates a date that is not inside my index.

Upvotes: 1

Views: 1877

Answers (1)

sunnytown
sunnytown

Reputation: 1996

This approach only works if your index column ('Date') contains DateTime objects. If it contains strings, you first have to convert the index to DateTime format.

df.index = pd.to_datetime(df.index)

With that, you can get the latest time either with latest = df.index[-1] or df.index.max().

Then we offset the lastest date by one year using pd.DateOffset and get the theoretical lagged date.

lagged_theoretical = latest - pd.DateOffset(years=1)

To obtain the closest date to the calculated date that is actually present in your DataFrame, we calculate the time delta between all the dates in your dataframe and the calculated date. From there, we selected the minimum to get the closest date. We fetch the index of the minimum in the timedelta array and use that index to get the actual date from the DataFrame`s index column. Here is the whole code:

latest = df.index[-1]
lagged_theoretical = latest - pd.DateOffset(years=1)
td = (abs(df.index - lagged_theoretical)).values
idx = np.where(td == td.min())[0][0]
lagged_actual = df.index[idx]

Upvotes: 1

Related Questions