Stix
Stix

Reputation: 49

Pandas rolling with variable window length

I was able to find a couple of similar questions to this one, but I don't believe they answer what I am trying to accomplish.

Quite simply I am trying to use rolling.min with a variable window length from another column in the df. My dataset will probably get pretty large in the future so I am trying to pull this off without having to use a loop

pvdata['templow'] = pvdata['low'].rolling(window = pvdata['counter']).min()

Trying to run this I get: ValueError: window must be an integer 0 or greater, I believe the issue is that I am passing a series instead of an integer for the window length so my next attempt was to use apply lambda:

pvdata['templow'] = pvdata['counter'].apply(lambda x: pvdata['low'].rolling(window = x).min())

This returned the error: ValueError: Expected a 1D array, got an array with shape (10989, 10989), which I cannot explain.

Here is an example of the output I am looking for:

enter image description here

Is this possible to do without a loop?

Upvotes: 1

Views: 1030

Answers (2)

Ahmed Thahir
Ahmed Thahir

Reputation: 3

Check this out. I created a variable size rolling statistics using a custom window indexer

import numpy as np
import pandas as pd
from pandas.api.indexers import BaseIndexer

class VariableWindowIndexer(BaseIndexer):
    def __init__(self, window_size, max_periods=None):
        super().__init__()
        self.window_size = window_size.values
        self.max_periods = max_periods

    def get_window_bounds(self, num_values, min_periods, center, closed, step):
        temp = np.arange(0, num_values, 1)

        if self.max_periods is not None:
          self.window_size = np.where(
              self.window_size <= self.max_periods,
              self.window_size,
              self.max_periods
          )

        window_end = temp + 1
        window_start = window_end - self.window_size
        return window_start, window_end

@Stix, you can change window_size to your counter column

horizon = 1
lag = df["y"].shift(horizon)
window_size = df.index + 1

rolling = lag.rolling(
    VariableWindowIndexer(window_size=window_size, max_periods=None),
    min_periods=1,
    center=False
)

df[["y_rolling_mean", "y_rolling_std"]] = rolling.agg(["mean", "std"])
t y lag y_rolling_mean y_rolling_std
0 1 1 NaN NaN NaN
1 2 2 1.0 1.0 NaN
2 3 3 2.0 1.5 0.707107
3 4 4 3.0 2.0 1.000000
4 5 5 4.0 2.5 1.290994
5 6 6 5.0 3.0 1.581139
6 7 7 6.0 3.5 1.870829
7 8 8 7.0 4.0 2.160247
8 9 9 8.0 4.5 2.449490
9 10 10 9.0 5.0 2.738613
10 11 11 10.0 5.5 3.027650
11 12 12 11.0 6.0 3.316625
12 13 13 12.0 6.5 3.605551
13 14 14 13.0 7.0 3.894440

Upvotes: 0

Code Different
Code Different

Reputation: 93191

You cannot call rolling with a variable window size. Depend on how many unique window sizes there are in the counter column, it maybe faster to call rolling with each size and combine the results.

Sample data

n = 1_000_000
m = 10 # The number of unique values in `counter`

np.random.seed(42)
low = np.random.randint(1, 100, n)
counter = np.random.choice(np.arange(1, m + 1), n)

pvdata = pd.DataFrame({"low": low, "counter": counter})

The fast method, but with limitations

codes, uniques = pd.factorize(pvdata["counter"], sort=True)
arr = np.vstack([pvdata["low"].rolling(u, min_periods=1).min() for u in uniques]).T
templow = arr[range(len(pvdata)), codes]

The slow but reliable method

n = len(pvdata)
low = pvdata["low"].to_numpy()
counter = pvdata["counter"].to_numpy()

templow = np.repeat(np.nan, n)
for i, c in zip(range(n), counter):
    lb = max(0, i - c + 1)
    ub = i + 1
    templow[i] = low[lb:ub].min()

Discussion

As m increases, the fast method will use more CPU and memory because:

  • It calls rolling more.
  • It must store the result of all those rolling calls. arr requires n x m x 8 bytes of memory.

The "fast" method becomes slower as m increases. During my test, both methods are about equal at m = 150, after which "fast" becomes the slower one. Experiment with your data and pick an algorithm that suits your needs.

Upvotes: 1

Related Questions