poklaassen
poklaassen

Reputation: 109

How to create a rolling time window with offset in Pandas

I want to apply some statistics on records within a time window with an offset. My data looks something like this:

                             lon        lat  stat  ...   speed  course  head
ts                                                 ...                      
2016-09-30 22:00:33.272  5.41463  53.173161    15  ...     0.0     0.0   511
2016-09-30 22:01:42.879  5.41459  53.173180    15  ...     0.0     0.0   511
2016-09-30 22:02:42.879  5.41461  53.173161    15  ...     0.0     0.0   511
2016-09-30 22:03:44.051  5.41464  53.173168    15  ...     0.0     0.0   511
2016-09-30 22:04:53.013  5.41462  53.173141    15  ...     0.0     0.0   511

[5 rows x 7 columns]

I need the records within time windows of 600 seconds, with steps of 300 seconds. For example, these windows:

start                     end
2016-09-30 22:00:00.000   2016-09-30 22:10:00.000
2016-09-30 22:05:00.000   2016-09-30 22:15:00.000
2016-09-30 22:10:00.000   2016-09-30 22:20:00.000

I have looked at Pandas rolling to do this. But it seems like it does not have the option to add the offset which I described above. Am I overlooking something, or should I create a custom function for this?

Upvotes: 0

Views: 1914

Answers (1)

Markus Rother
Markus Rother

Reputation: 434

What you want to achieve should be possible by combining DataFrame.resample with DataFrame.shift.

import pandas as pd

index = pd.date_range('1/1/2000', periods=9, freq='T')
series = pd.Series(range(9), index=index)
df = pd.DataFrame(series)

That will give you a primitive timeseries (example taken from api docs DataFrame.resample).

2000-01-01 00:00:00  0                                                                                                                                                                        
2000-01-01 00:01:00  1                                                                                                                                                                        
2000-01-01 00:02:00  2                                                                                                                                                                        
2000-01-01 00:03:00  3                                                                                                                                                                        
2000-01-01 00:04:00  4                                                                                                                                                                        
2000-01-01 00:05:00  5                                                                                                                                                                        
2000-01-01 00:06:00  6                                                                                                                                                                        
2000-01-01 00:07:00  7                                                                                                                                                                        
2000-01-01 00:08:00  8

Now resample by your step size (see DataFrame.shift).

sampled = df.resample('90s').sum()

This will give you non-overlapping windows of the step size.

2000-01-01 00:00:00   1                                                                                                                                                                       
2000-01-01 00:01:30   2                                                                                                                                                                       
2000-01-01 00:03:00   7                                                                                                                                                                       
2000-01-01 00:04:30   5                                                                                                                                                                       
2000-01-01 00:06:00  13                                                                                                                                                                       
2000-01-01 00:07:30   8

Finally, shift the sampled df by one step and sum with the previously created df. Window size being twice the step size helps.

sampled.shift(1, fill_value=0) + sampled

This will yield:

2000-01-01 00:00:00   1                                                                                                                                                                       
2000-01-01 00:01:30   3                                                                                                                                                                       
2000-01-01 00:03:00   9                                                                                                                                                                       
2000-01-01 00:04:30  12                                                                                                                                                                       
2000-01-01 00:06:00  18                                                                                                                                                                       
2000-01-01 00:07:30  21 

There may be a more elegant solution, but I hope this helps.

Upvotes: 1

Related Questions