IamTheWalrus
IamTheWalrus

Reputation: 604

Processing pandas dataframe with a moving window

Taking the following dataframe as an example:

                     a
2011-01-01 00:00:00  9
2011-01-01 01:00:00  4
2011-01-01 02:00:00  2
2011-01-01 03:00:00  5
2011-01-01 04:00:00  3
2011-01-01 05:00:00  7
2011-01-01 06:00:00  8
2011-01-01 07:00:00  4

I want to process chunks of "a" column with a moving window, using the function autoCorr() below to generate "b" column accordingly.

def autoCorr(x):
    return np.corrcoef(x[:-1], x[1:])[0,1]

For a moving window of length 5, the final result should then be:

                     a       b
2011-01-01 00:00:00  9  0.0000
2011-01-01 01:00:00  4  0.0000
2011-01-01 02:00:00  2  0.0000
2011-01-01 03:00:00  5  0.0000
2011-01-01 04:00:00  3 -0.0877    # autoCorr([9,4,2,5,3])
2011-01-01 05:00:00  7 -0.6404    # autoCorr([4,2,5,3,7])
2011-01-01 06:00:00  8  0.2880    # autoCorr([2,5,3,7,8])
2011-01-01 07:00:00  4 -0.1580    # autoCorr([5,3,7,8,4])

Upvotes: 1

Views: 214

Answers (1)

jezrael
jezrael

Reputation: 863421

Use Rolling.apply + fillna:

df = df.rolling(5).apply(autoCorr).fillna(0)
print (df)
                            a
2011-01-01 00:00:00  0.000000
2011-01-01 01:00:00  0.000000
2011-01-01 02:00:00  0.000000
2011-01-01 03:00:00  0.000000
2011-01-01 04:00:00 -0.087706
2011-01-01 05:00:00 -0.640445
2011-01-01 06:00:00  0.288136
2011-01-01 07:00:00 -0.157877

df = df.rolling(5).apply(lambda x: pd.Series(x).autocorr(1)).fillna(0)
print (df)

                            a
2011-01-01 00:00:00  0.000000
2011-01-01 01:00:00  0.000000
2011-01-01 02:00:00  0.000000
2011-01-01 03:00:00  0.000000
2011-01-01 04:00:00 -0.087706
2011-01-01 05:00:00 -0.640445
2011-01-01 06:00:00  0.288136
2011-01-01 07:00:00 -0.157877

EDIT:

df1 = df.join(df.rolling(5).apply(autoCorr).fillna(0)['a'].rename('b'))
print (df1)
                     a         b
2011-01-01 00:00:00  9  0.000000
2011-01-01 01:00:00  4  0.000000
2011-01-01 02:00:00  2  0.000000
2011-01-01 03:00:00  5  0.000000
2011-01-01 04:00:00  3 -0.087706
2011-01-01 05:00:00  7 -0.640445
2011-01-01 06:00:00  8  0.288136
2011-01-01 07:00:00  4 -0.157877

Upvotes: 1

Related Questions