Pandas: rolling sum using precedent rows within timeframe, with time not the index

Question

Let us suppose I have the following dataframe:

data = {'measure_id': ['0', '1', '2', '3', '4'],
        'measure_time': ['2019-11-22 22:30:08.559000', '2019-11-22 22:36:09.149000', '2019-11-22 22:36:09.149000', '2019-11-22 22:40:09.261000', '2019-11-22 22:46:10.011000'],
        'pressure': [10, 9, 3, 11, 12]}
df = pd.DataFrame(data, columns=['measure_id', 'measure_time', 'pressure'])
df.set_index('measure_id', inplace=True)
print('df:
', df)

I need to compute for each row the time rolling sum of all precedent rows, that is where the difference:

measure_time of current row - measure_time of precedent row

is inferior to some value (let us say 2 minutes for example), the current row included in the sum. This would give here the column rolling_sum:

data = {'measure_id': ['0', '1', '2', '3', '4'],
        'measure_time': ['2019-11-22 22:30:08.559000', '2019-11-22 22:36:09.149000', '2019-11-22 22:36:09.149000', '2019-11-22 22:40:09.261000', '2019-11-22 22:46:10.011000'],
        'pressure': [10, 9, 3, 11, 12],
        'rolling_sum':[10, 9, 12, 11, 12]}
df = pd.DataFrame(data, columns=['measure_id', 'measure_time', 'pressure', 'rolling_sum'])
df.set_index('measure_id', inplace=True)
print('df:
', df)

It seems now Pandas can handle rolling with offset Pandas: rolling mean by time interval , but I have not been able to make this solution work here (probably because of the non-time index).

FBruzzesi · Accepted Answer

First you should convert the measure_time column to date type:

df['measure_time'] = pd.to_datetime(df['measure_time'])

Then you can use the rolling method on such column, based on time difference as follows:

result_df = df.rolling('2s', on='measure_time').sum()

result_df

                   measure_time     pressure
measure_id                                  
0          2019-11-22 22:30:08.559      10.0
1          2019-11-22 22:36:09.149       9.0
2          2019-11-22 22:36:09.149      12.0
3          2019-11-22 22:40:09.261      11.0
4          2019-11-22 22:46:10.011      12.0

where 2s indicates a 2 seconds window.

Pandas: rolling sum using precedent rows within timeframe, with time not the index

Answers (1)

Related Questions