Reputation: 1344
I have a DataFrame of values recorded and the index set to DatetimeIndex. A value is recorded approximately every 15 minutes.
I want to add a new column that is the fractional difference of the current value from a value 24 hours previously. Since the values are recorded approximately every fifteen minutes, I want to shift to the time index that is closest to 24 hours previously. If I try to do this exactly, I end up with a whole lot of NaN
s:
df["value"] / df["value"].shift(freq = datetime.timedelta(days = -1))
How should this shift be done so that the shift is to the nearest possible time index to the one specified? Is there an alternative, easier way to think about this?
Here is an example that illustrates the issue:
df = pd.DataFrame(
[
[pd.Timestamp("2015-07-18 13:53:33.280"), 10],
[pd.Timestamp("2015-07-19 13:54:03.330"), 20],
[pd.Timestamp("2015-07-20 13:52:13.350"), 30],
[pd.Timestamp("2015-07-21 13:56:03.126"), 40],
[pd.Timestamp("2015-07-22 13:53:51.747"), 50],
[pd.Timestamp("2015-07-23 13:53:29.346"), 60]
],
columns = [
"datetime",
"value"
]
)
df.index = df["datetime"]
del df["datetime"]
df.index = pd.to_datetime(df.index.values)
df["change"] = df["value"] / df["value"].shift(freq = datetime.timedelta(days = -1))
Upvotes: 1
Views: 866
Reputation: 651
As follow your code:
df/df.shift(1)
value
2015-07-18 13:53:33.280 NaN
2015-07-19 13:54:03.330 2.000000
2015-07-20 13:52:13.350 1.500000
2015-07-21 13:56:03.126 1.333333
2015-07-22 13:53:51.747 1.250000
2015-07-23 13:53:29.346 1.200000
I can't sure if it is OK,but it seems to get the same answer.
Upvotes: 0
Reputation: 294218
I'd add one day to the index then use pd.DataFrame.reindex
with method='nearest'
df / df.set_index(df.index + pd.offsets.Day()).reindex(df.index, method='nearest')
value
2015-07-18 13:53:33.280 1.000000
2015-07-19 13:54:03.330 2.000000
2015-07-20 13:52:13.350 1.500000
2015-07-21 13:56:03.126 1.333333
2015-07-22 13:53:51.747 1.250000
2015-07-23 13:53:29.346 1.200000
You can provide another offset as a tolerance on the method='nearest'
df / df.set_index(df.index + pd.offsets.Day()).reindex(
df.index, method='nearest', tolerance=pd.offsets.Hour(12))
value
2015-07-18 13:53:33.280 NaN
2015-07-19 13:54:03.330 2.000000
2015-07-20 13:52:13.350 1.500000
2015-07-21 13:56:03.126 1.333333
2015-07-22 13:53:51.747 1.250000
2015-07-23 13:53:29.346 1.200000
Upvotes: 2