ValueError: window must be an integer 0 or greater when using rolling window on reset_index

Question

I am trying to compute the rolling sum of a column for each date based on each unique combination of two features. Here is an example of the code:

import pandas as pd
from tqdm import tqdm

# Example DataFrame
data = {'DATE_ADDED': ['2024-03-01', '2024-03-02', '2024-03-03', '2024-03-04', '2024-03-05', '2024-03-06'],
        'MAKE': ['Toyota', 'Toyota', 'Toyota', 'Toyota', 'Toyota', 'Toyota'],
        'MODEL': ['Camry', 'Camry', 'Camry', 'Camry', 'Camry', 'Camry'],
        'MM_1': [10, 20, 30, 40, 50, 60]}
df = pd.DataFrame(data)

# Convert 'DATE_ADDED' column to datetime
df['DATE_ADDED'] = pd.to_datetime(df['DATE_ADDED'])

# Sort DataFrame by 'DATE_ADDED'
df = df.sort_values(by='DATE_ADDED')

# Get unique combinations of MAKE and MODEL
unique_make_model_combinations = df.groupby(['MAKE', 'MODEL']).size().reset_index()
unique_combinations_list = unique_make_model_combinations[['MAKE', 'MODEL']].values.tolist()

# Create an empty DataFrame to store results
result_df = pd.DataFrame(columns=['DATE_ADDED', 'MAKE', 'MODEL', 'MM_1_sum_30d'])

# Iterate over unique combinations
for combination in tqdm(unique_combinations_list, desc='Processing'):
    make, model = combination[0], combination[1]
    
    # Filter DataFrame for the specific make and model
    filtered_df = df[(df['MAKE'] == make) & (df['MODEL'] == model)]
    
    # Reset index to avoid duplicate index labels
    filtered_df.reset_index(drop=True, inplace=True)
    
    # Calculate rolling sum for the past 30 days
    filtered_df['MM_1_sum_30d'] = filtered_df['MM_1'].rolling('30D').sum()
    
    # Append results to the result DataFrame
    result_df = pd.concat([result_df, filtered_df], ignore_index=True)

print(result_df)

I can iterate over the df to perform the calculation, but it is a large df. Each date may have multiple entries, but I only want one to take one entry from each date. So if there are multiple TOYOTA CAMRY for a given date, I only need to sum one of those MM_1 values for each period in the rolling time frame that TOYOTA CAMRY exists.

I believe it has something to do with reindexing and potentially having a non-unique index.

Any advice would be greatly appreciated!

ValueError: window must be an integer 0 or greater when using rolling window on reset_index

Answers (1)

Related Questions