Reputation: 201
I have a dataframe with time indices and need to partition it into overlapping segments (say 5 days in length with 1 day offset).
Here is a sample dataframe:
import pandas as pd
i=pd.date_range('05-01-2015', '21-05-2018', freq='D')
x=pd.DataFrame(index=i, data = np.random.randint(0, 100, len(i)))
After partitioning the indices of the segments should range respectively from 2015-05-01
to 2015-05-05
, from 2015-05-02
to 2015-05-06
, from 2015-05-03
to 2015-05-08
etc. I assume there should be some trick with groupby
to do it, but couldn't come up with an efficient implementation.
As a reference, this is implemented in mathematica:
Partition[list,n,d]
- generates sublists with offset d.
I would greatly appreciate any insight you guys can provide.
Upvotes: 1
Views: 230
Reputation: 4251
As @alkasm pointed out, you can perform aggregate calculations on 5-day partitions using your sample dataframe using rolling
in pandas:
import pandas as pd
i = pd.date_range('05-01-2015', '21-05-2018', freq='D')
x = pd.DataFrame(index=i, data = np.random.randint(0, 100, len(i)))
x.rolling(5).sum()
The trick with this is, it calculates the aggregate function looking backwards, so you may need to expand your index to include the four days prior. A row in your dataframe with index 2018-05-06
with a 5-row rolling()
call would represent the segment 2018-05-02
to 2018-05-06
inclusive (which is 5 days).
Upvotes: 2