Fred S
Fred S

Reputation: 1511

Arithmetic operations on datetime index in pandas

In pandas, you can access specific positions of a time series either by classical integer position / row based indexing, or by datetime based indexing. The integer based index can be manipulated using basic arithmetic operations, e.g. if I have a integer_index for a time series with frequency 12 hours and I want to access the entry exactly one day prior to this, I can simply do integer_index - 2. However, real world data are not always perfect, and sometimes rows are missing. In this case, this method fails, and it would be helpful to be able to use datetime based indexing and subtract, for example, one day from this index. How can I do this?

Sample script:

# generate a sample time series
import pandas as pd
s = pd.Series(["A", "B", "C", "D", "E"], index=pd.date_range("2000-01-01", periods=5, freq="12h"))
print s

2000-01-01 00:00:00    A
2000-01-01 12:00:00    B
2000-01-02 00:00:00    C
2000-01-02 12:00:00    D
2000-01-03 00:00:00    E
Freq: 12H, dtype: object

# these to indices should access the same value ("C")
integer_index = 2
date_index = "2000-01-02 00:00"

print s[integer_index]  # prints "C"
print s[date_index]  # prints "C"

# I can access the value one day earlier by subtracting 2 from the integer index
print s[integer_index - 2]  # prints A

# how can I subtract one day from the date index?
print s[date_index - 1]  # raises an error 

The background to this question can be found in an earlier submission of mine here:

Fill data gaps with average of data from adjacent days

where user JohnE found a workaround to my problem that uses integer position based indexing. He makes sure that I have equally spaced data by resampling the time series.

Upvotes: 5

Views: 11914

Answers (2)

VMQ
VMQ

Reputation: 193

The previous answer by Ffisegydd is excellent, except that pandas provides an equivalent function Timedelta that is compatible with np.timedelta64 and has a few more bells and whistles. Just replace timedelta(days=1) with pd.Timedelta(days=1) in his example to enjoy more compatibility.

Upvotes: 2

Ffisegydd
Ffisegydd

Reputation: 53698

Your datetime index isn't based on strings, it's a DatetimeIndex meaning you can use datetime objects to index appropriately, rather than a string which looks like a date.

The code below converts date_index into a datetime object and then uses timedelta(days=1) to subtract "one day" away from it.

# generate a sample time series
import pandas as pd
from datetime import datetime, timedelta

s = pd.Series(["A", "B", "C", "D", "E"], index=pd.date_range("2000-01-01", periods=5, freq="12h"))
print(s)

# these two indices should access the same value ("C")
integer_index = 2
# Converts the string into a datetime object
date_index = datetime.strptime("2000-01-02 00:00", "%Y-%m-%d %H:%M")
print(date_index) # 2000-01-02 00:00:00

print(s[integer_index])  # prints "C"
print(s[date_index])  # prints "C"


print(s[integer_index - 2])  # prints "A"

one_day = timedelta(days=1)
print(s[date_index - one_day]) # prints "A"
print(date_index - one_day) # 2000-01-01 00:00:00

Upvotes: 4

Related Questions