Reputation: 113
Let us suppose I have the following dataframe:
data = {'measure_id': ['0', '1', '2', '3', '4'],
'measure_time': ['2019-11-22 22:30:08.559000', '2019-11-22 22:36:09.149000', '2019-11-22 22:36:09.149000', '2019-11-22 22:40:09.261000', '2019-11-22 22:46:10.011000'],
'pressure': [10, 9, 3, 11, 12]}
df = pd.DataFrame(data, columns=['measure_id', 'measure_time', 'pressure'])
df.set_index('measure_id', inplace=True)
print('df:\n', df)
I need to compute for each row the time rolling sum of all precedent rows, that is where the difference:
measure_time of current row - measure_time of precedent row
is inferior to some value (let us say 2 minutes for example), the current row included in the sum. This would give here the column rolling_sum:
data = {'measure_id': ['0', '1', '2', '3', '4'],
'measure_time': ['2019-11-22 22:30:08.559000', '2019-11-22 22:36:09.149000', '2019-11-22 22:36:09.149000', '2019-11-22 22:40:09.261000', '2019-11-22 22:46:10.011000'],
'pressure': [10, 9, 3, 11, 12],
'rolling_sum':[10, 9, 12, 11, 12]}
df = pd.DataFrame(data, columns=['measure_id', 'measure_time', 'pressure', 'rolling_sum'])
df.set_index('measure_id', inplace=True)
print('df:\n', df)
It seems now Pandas can handle rolling with offset Pandas: rolling mean by time interval , but I have not been able to make this solution work here (probably because of the non-time index).
Upvotes: 1
Views: 161
Reputation: 6485
First you should convert the measure_time
column to date type:
df['measure_time'] = pd.to_datetime(df['measure_time'])
Then you can use the rolling
method on such column, based on time difference as follows:
result_df = df.rolling('2s', on='measure_time').sum()
result_df
measure_time pressure
measure_id
0 2019-11-22 22:30:08.559 10.0
1 2019-11-22 22:36:09.149 9.0
2 2019-11-22 22:36:09.149 12.0
3 2019-11-22 22:40:09.261 11.0
4 2019-11-22 22:46:10.011 12.0
where 2s
indicates a 2 seconds window.
Upvotes: 1