Reputation: 1511
In pandas, you can access specific positions of a time series either by classical integer position / row based indexing, or by datetime based indexing. The integer based index can be manipulated using basic arithmetic operations, e.g. if I have a integer_index
for a time series with frequency 12 hours and I want to access the entry exactly one day prior to this, I can simply do integer_index - 2
. However, real world data are not always perfect, and sometimes rows are missing. In this case, this method fails, and it would be helpful to be able to use datetime based indexing and subtract, for example, one day
from this index. How can I do this?
Sample script:
# generate a sample time series
import pandas as pd
s = pd.Series(["A", "B", "C", "D", "E"], index=pd.date_range("2000-01-01", periods=5, freq="12h"))
print s
2000-01-01 00:00:00 A
2000-01-01 12:00:00 B
2000-01-02 00:00:00 C
2000-01-02 12:00:00 D
2000-01-03 00:00:00 E
Freq: 12H, dtype: object
# these to indices should access the same value ("C")
integer_index = 2
date_index = "2000-01-02 00:00"
print s[integer_index] # prints "C"
print s[date_index] # prints "C"
# I can access the value one day earlier by subtracting 2 from the integer index
print s[integer_index - 2] # prints A
# how can I subtract one day from the date index?
print s[date_index - 1] # raises an error
The background to this question can be found in an earlier submission of mine here:
Fill data gaps with average of data from adjacent days
where user JohnE found a workaround to my problem that uses integer position based indexing. He makes sure that I have equally spaced data by resampling the time series.
Upvotes: 5
Views: 11914
Reputation: 193
The previous answer by Ffisegydd is excellent, except that pandas provides an equivalent function Timedelta that is compatible with np.timedelta64 and has a few more bells and whistles. Just replace timedelta(days=1)
with pd.Timedelta(days=1)
in his example to enjoy more compatibility.
Upvotes: 2
Reputation: 53698
Your datetime index isn't based on strings, it's a DatetimeIndex
meaning you can use datetime
objects to index appropriately, rather than a string which looks like a date.
The code below converts date_index
into a datetime
object and then uses timedelta(days=1)
to subtract "one day" away from it.
# generate a sample time series
import pandas as pd
from datetime import datetime, timedelta
s = pd.Series(["A", "B", "C", "D", "E"], index=pd.date_range("2000-01-01", periods=5, freq="12h"))
print(s)
# these two indices should access the same value ("C")
integer_index = 2
# Converts the string into a datetime object
date_index = datetime.strptime("2000-01-02 00:00", "%Y-%m-%d %H:%M")
print(date_index) # 2000-01-02 00:00:00
print(s[integer_index]) # prints "C"
print(s[date_index]) # prints "C"
print(s[integer_index - 2]) # prints "A"
one_day = timedelta(days=1)
print(s[date_index - one_day]) # prints "A"
print(date_index - one_day) # 2000-01-01 00:00:00
Upvotes: 4