Richard Dixon
Richard Dixon

Reputation: 341

"window must be an integer 0 or greater" issue with '30D' style rolling calculations

I've had a look and can't seem to find a solution to this issue. I'm wanting to calculate the rolling sum of the previous 30 days' worth of data at each date in the dataframe - by subgroup - for a set of data that isn't daily - it's spaced fairly irregularly. I've been attempting to use ChatGPT which is getting in a twist over it.

Initially the suggestion was that I'd not converted the Date column to datetime format to allow for the rolling calculation, but now from the code below:

import pandas as pd
from datetime import datetime, timedelta
import numpy as np

# Create a dataset with irregularly spaced dates spanning two years
np.random.seed(42)
date_rng = pd.date_range(start='2022-01-01', end='2023-12-31', freq='10D')  # Every 10 days
data = {'Date': np.random.choice(date_rng, size=30),
        'Group': np.random.choice(['A', 'B'], size=30),
        'Value': np.random.randint(1, 30, size=30)}

df = pd.DataFrame(data)

# Sort DataFrame by date
df.sort_values(by='Date', inplace=True)

df['Date'] = pd.to_datetime(df['Date'])

# Calculate cumulative sum by group within the previous 30 days from each day
df['RollingSum_Last30Days'] = df.groupby('Group')['Value'].transform(lambda x: x.rolling(window='30D', min_periods=1).sum())

I'm getting an error of:

ValueError: window must be an integer 0 or greater

I've found conflicting comments online as to whether the format '30D' works in rolling windows but I'm none the wiser as to a solution to this. Any help appreciated.

Running in VSCode in Python 3.11.8.

Upvotes: 2

Views: 547

Answers (2)

Soudipta Dutta
Soudipta Dutta

Reputation: 2152

Instead of using lambda, Create a function. Using a function is more readable.

import pandas as pd

data = {
    'Date': [
        '2024-01-01', '2024-01-05', '2024-01-10', '2024-02-01', 
        '2024-02-15', '2024-03-01', '2024-03-10', '2024-04-01',
        '2024-04-10', '2024-05-01', '2024-05-15', '2024-06-01'
    ],
    'Group': ['A', 'A', 'B', 'B', 'A', 'B', 'A', 'B', 'A', 'B', 'A', 'B'],
    'Value': [10, 20, 15, 30, 25, 10, 35, 20, 5, 40, 30, 25]
}

df = pd.DataFrame(data)

# Convert 'Date' column to datetime
df['Date'] = pd.to_datetime(df['Date'])

# Sort DataFrame by date
df.sort_values(by='Date', inplace=True)

# Set Date column as index
df.set_index('Date', inplace=True)
def calculate_rolling_sum(gr):
    aa = gr.rolling(window='30D',min_periods= 1)['Value'].sum()
    return aa

rolling_sums = df.groupby('Group',group_keys=False)\
.apply(calculate_rolling_sum,include_groups= False)
df['RollingSum_Last30Days'] = rolling_sums
print(df)
'''
           Group  Value  RollingSum_Last30Days
Date                                          
2024-01-01     A     10                   10.0
2024-01-05     A     20                   30.0
2024-01-10     B     15                   15.0
2024-02-01     B     30                   45.0
2024-02-15     A     25                   25.0
2024-03-01     B     10                   40.0
2024-03-10     A     35                   60.0
2024-04-01     B     20                   20.0
2024-04-10     A      5                    5.0
2024-05-01     B     40                   40.0
2024-05-15     A     30                   30.0
2024-06-01     B     25                   25.0
'''

Upvotes: 0

mozway
mozway

Reputation: 262214

The issue if that you need to specify which column to use as Date but don't have access to the Date with groupby.transform.

You could use groupby.apply:

# Calculate cumulative sum by group within the previous 30 days from each day
df['RollingSum_Last30Days'] = (df.groupby('Group', group_keys=False)
                                 .apply(lambda x: x.rolling(window='30D', on='Date', min_periods=1)['Value'].sum())
                              )

Output:

         Date Group  Value  RollingSum_Last30Days
9  2022-01-11     A     22                   22.0
12 2022-01-11     A     22                   44.0
6  2022-01-21     A      4                   48.0
1  2022-05-21     B     14                   14.0
23 2022-05-21     A      8                    8.0
15 2022-07-20     B     26                   26.0
4  2022-07-20     A     18                   18.0
18 2022-07-30     B     10                   36.0
7  2022-07-30     A      2                   20.0
5  2022-08-19     A      8                   10.0
10 2022-10-18     B     10                   10.0
16 2022-11-17     B     12                   12.0
11 2023-01-06     B      4                    4.0
21 2023-02-15     B     16                   16.0
26 2023-04-06     B     28                   28.0
19 2023-04-26     A      4                    4.0
28 2023-05-16     B      8                    8.0
0  2023-05-26     B      3                   11.0
8  2023-06-05     A      6                    6.0
29 2023-06-25     A     21                   27.0
17 2023-07-25     A      2                    2.0
20 2023-08-04     B     14                   14.0
22 2023-08-14     B     15                   29.0
14 2023-08-14     B     18                   47.0
3  2023-08-24     A      4                    4.0
24 2023-09-03     B     14                   47.0
25 2023-09-03     A     23                   27.0
27 2023-09-03     A     25                   52.0
13 2023-09-23     B     29                   43.0
2  2023-12-12     A     17                   17.0

Upvotes: 1

Related Questions