"window must be an integer 0 or greater" issue with '30D' style rolling calculations

Question

I've had a look and can't seem to find a solution to this issue. I'm wanting to calculate the rolling sum of the previous 30 days' worth of data at each date in the dataframe - by subgroup - for a set of data that isn't daily - it's spaced fairly irregularly. I've been attempting to use ChatGPT which is getting in a twist over it.

Initially the suggestion was that I'd not converted the Date column to datetime format to allow for the rolling calculation, but now from the code below:

import pandas as pd
from datetime import datetime, timedelta
import numpy as np

# Create a dataset with irregularly spaced dates spanning two years
np.random.seed(42)
date_rng = pd.date_range(start='2022-01-01', end='2023-12-31', freq='10D')  # Every 10 days
data = {'Date': np.random.choice(date_rng, size=30),
        'Group': np.random.choice(['A', 'B'], size=30),
        'Value': np.random.randint(1, 30, size=30)}

df = pd.DataFrame(data)

# Sort DataFrame by date
df.sort_values(by='Date', inplace=True)

df['Date'] = pd.to_datetime(df['Date'])

# Calculate cumulative sum by group within the previous 30 days from each day
df['RollingSum_Last30Days'] = df.groupby('Group')['Value'].transform(lambda x: x.rolling(window='30D', min_periods=1).sum())

I'm getting an error of:

ValueError: window must be an integer 0 or greater

I've found conflicting comments online as to whether the format '30D' works in rolling windows but I'm none the wiser as to a solution to this. Any help appreciated.

Running in VSCode in Python 3.11.8.

mozway · Accepted Answer

The issue if that you need to specify which column to use as Date but don't have access to the Date with groupby.transform.

You could use groupby.apply:

# Calculate cumulative sum by group within the previous 30 days from each day
df['RollingSum_Last30Days'] = (df.groupby('Group', group_keys=False)
                                 .apply(lambda x: x.rolling(window='30D', on='Date', min_periods=1)['Value'].sum())
                              )

Output:

         Date Group  Value  RollingSum_Last30Days
9  2022-01-11     A     22                   22.0
12 2022-01-11     A     22                   44.0
6  2022-01-21     A      4                   48.0
1  2022-05-21     B     14                   14.0
23 2022-05-21     A      8                    8.0
15 2022-07-20     B     26                   26.0
4  2022-07-20     A     18                   18.0
18 2022-07-30     B     10                   36.0
7  2022-07-30     A      2                   20.0
5  2022-08-19     A      8                   10.0
10 2022-10-18     B     10                   10.0
16 2022-11-17     B     12                   12.0
11 2023-01-06     B      4                    4.0
21 2023-02-15     B     16                   16.0
26 2023-04-06     B     28                   28.0
19 2023-04-26     A      4                    4.0
28 2023-05-16     B      8                    8.0
0  2023-05-26     B      3                   11.0
8  2023-06-05     A      6                    6.0
29 2023-06-25     A     21                   27.0
17 2023-07-25     A      2                    2.0
20 2023-08-04     B     14                   14.0
22 2023-08-14     B     15                   29.0
14 2023-08-14     B     18                   47.0
3  2023-08-24     A      4                    4.0
24 2023-09-03     B     14                   47.0
25 2023-09-03     A     23                   27.0
27 2023-09-03     A     25                   52.0
13 2023-09-23     B     29                   43.0
2  2023-12-12     A     17                   17.0

"window must be an integer 0 or greater" issue with '30D' style rolling calculations

Answers (2)

Related Questions

&quot;window must be an integer 0 or greater&quot; issue with &#39;30D&#39; style rolling calculations

Answers (2)

Related Questions

"window must be an integer 0 or greater" issue with '30D' style rolling calculations