compguy24
compguy24

Reputation: 957

Compare current dataframe values to aggregate values from previous timesteps in pandas

I have a pandas time series indexed at 15 minute intervals which are timestamps. At each interval, I have multiple columns a, b and c.

| index   | a | b | c |
| 9:00 am | 2 | 2 | 4 |
| 9:15 am | 2 | 2 | 4 |
...

I need to compare the average the value of a at the same time 1, 2, 3 and 4 weeks back to the current timestep. So if my current time is 9:15 am, I need to find the average of a at 9:15 am from the previous week, 2 weeks, 3 and 4 weeks back.

Obviously this cannot be calculated on the first 4 weeks of the dataset, because there is not enough history. I'm stuck on how to think about shifting the data frame to the past to get those values to aggregate and then compare to the future.

There is some similarity to this question but there the index is not a timeseries, and the comparison is a bit simpler.

Upvotes: 2

Views: 196

Answers (1)

Charles Landau
Charles Landau

Reputation: 4265

Here I do it with days instead of weeks. I start with making dummy data based on your example:

import pandas as pd
import random
d = [
    {"ts":pd.Timestamp(year=2017, month=1, day=1, hour=12,
                 minute=0, second=0) + pd.Timedelta(x*15, unit="s"),
    "a": random.randint(2, 5),
    "b": random.randint(2, 5),
    "c": random.randint(2, 5),} for x in range(0, 30000)
]
dft = pd.DataFrame(d).set_index("ts")

I define a handler function that tries to get a value exactly 0, 1, 2, and 3 days from the row. Since I'll get a key error for the first 4 days there's a try-except with np.NaN. Note the Timedelta(unit=) kwarg. You can change that to get this effect for other units - I think this would be less error-prone than tweaking the call to range.

def handler(row):
  try: 
    m = np.mean([dft.loc[row.name-pd.Timedelta(x, unit="d")][0] for x in range(4)])
  except KeyError as e:
    return np.NaN
  return m

Finally, use apply.

dft.apply(handler, axis=1)

It's fairly slow, so I'll try to think of a faster way but for now I think this is it.

Upvotes: 2

Related Questions