Partition dataframe to overlapping segments

Question

I have a dataframe with time indices and need to partition it into overlapping segments (say 5 days in length with 1 day offset).

Here is a sample dataframe:

import pandas as pd    
i=pd.date_range('05-01-2015', '21-05-2018', freq='D')
x=pd.DataFrame(index=i, data = np.random.randint(0, 100, len(i)))

After partitioning the indices of the segments should range respectively from 2015-05-01 to 2015-05-05, from 2015-05-02 to 2015-05-06, from 2015-05-03 to 2015-05-08 etc. I assume there should be some trick with groupby to do it, but couldn't come up with an efficient implementation.

As a reference, this is implemented in mathematica:

Partition[list,n,d] - generates sublists with offset d.

I would greatly appreciate any insight you guys can provide.

economy · Accepted Answer

As @alkasm pointed out, you can perform aggregate calculations on 5-day partitions using your sample dataframe using rolling in pandas:

import pandas as pd    
i = pd.date_range('05-01-2015', '21-05-2018', freq='D')
x = pd.DataFrame(index=i, data = np.random.randint(0, 100, len(i)))

x.rolling(5).sum()

The trick with this is, it calculates the aggregate function looking backwards, so you may need to expand your index to include the four days prior. A row in your dataframe with index 2018-05-06 with a 5-row rolling() call would represent the segment 2018-05-02 to 2018-05-06 inclusive (which is 5 days).

Partition dataframe to overlapping segments

Answers (1)

Related Questions