Reputation: 253
I have this S&P 500 historical data sample, and I want to compare the dates inside of him.
>> df
High Low Open Close Volume Adj Close
Date
2011-01-03 127.599998 125.699997 126.709999 127.050003 138725200.0 104.119293
2011-01-04 127.370003 126.190002 127.330002 126.980003 137409700.0 104.061905
2011-01-05 127.720001 126.459999 126.580002 127.639999 133975300.0 104.602806
2011-01-06 127.830002 127.010002 127.690002 127.389999 122519000.0 104.397934
2011-01-07 127.769997 126.150002 127.559998 127.139999 156034600.0 104.193031
... ... ... ... ... ... ...
2020-12-14 369.799988 364.470001 368.640015 364.660004 69216200.0 363.112183
2020-12-15 369.589996 365.920013 367.399994 369.589996 64071100.0 368.021240
2020-12-16 371.160004 368.869995 369.820007 370.170013 58420500.0 368.598816
2020-12-17 372.459991 371.049988 371.940002 372.239990 64119500.0 370.660004
2020-12-18 371.149994 367.019989 370.970001 369.179993 135359900.0 369.179993
let latest
be the most recent SnP OHLC prices
latest = df.iloc[-1]
How can I find the date inside this dataframe index that is the closest to latest
lagged by 1 year (latest.replace(year=latest.year-1)
? Just using the pd.Timestamp.replace
method sometimes doesn't work, it can generates a date that is not inside my index.
Upvotes: 1
Views: 1877
Reputation: 1996
This approach only works if your index column ('Date') contains DateTime objects. If it contains strings, you first have to convert the index to DateTime format.
df.index = pd.to_datetime(df.index)
With that, you can get the latest time either with latest = df.index[-1]
or df.index.max()
.
Then we offset the lastest date by one year using pd.DateOffset
and get the theoretical lagged date.
lagged_theoretical = latest - pd.DateOffset(years=1)
To obtain the closest date to the calculated date that is actually present in your DataFrame, we calculate the time delta between all the dates in your dataframe and the calculated date. From there, we selected the minimum to get the closest date. We fetch the index of the minimum in the timedelta array and use that index to get the actual date from the DataFrame`s index column. Here is the whole code:
latest = df.index[-1]
lagged_theoretical = latest - pd.DateOffset(years=1)
td = (abs(df.index - lagged_theoretical)).values
idx = np.where(td == td.min())[0][0]
lagged_actual = df.index[idx]
Upvotes: 1