Looping through a pandas DataFrame accessing previous elements

Question

I have two DataFrames, FirstColumn & SecondColumn.

How do I create a new column, containing the correlation coeff. row by row for the two columns 5 periods back?

For example, the 5th row would be the R2 value of the two columns 5 periods back, the 6th row would be the corr coeff. value of the columns ranging from row 1-6 etc etc.

Additionally, what method is the most efficient when looping through a DataFrame, having to access previous rows?

FirstColumn SecondColumn
0   2   1.0
1   3   3.0
2   4   4.0
3   5   5.0
4   6   2.0
5   7   6.0
6   2   2.0
7   3   3.0
8   5   9.0
9   3   2.0
10  2   3.0
11  4   2.0
12  2   2.0
13  4   2.0
14  2   4.0
15  5   3.0
16  3   1.0

Georgina Skibinski · Accepted Answer

You can do:

df["corr"]=df.rolling(5, min_periods=1).corr()["FirstColumn"].loc[(slice(None), "SecondColumn")]

Outputs:

    FirstColumn  SecondColumn      corr
0           2.0           1.0       NaN
1           3.0           3.0  1.000000
2           4.0           4.0  0.981981
3           5.0           5.0  0.982708
4           6.0           2.0  0.400000
5           7.0           6.0  0.400000
6           2.0           2.0  0.566707
7           3.0           3.0  0.610572
8           5.0           9.0  0.426961
9           3.0           2.0  0.737804
10          2.0           3.0  0.899659
11          4.0           2.0  0.698774
12          2.0           2.0  0.716769
13          4.0           2.0 -0.559017
14          2.0           4.0 -0.612372
15          5.0           3.0 -0.250000
16          3.0           1.0 -0.067267

Looping through a pandas DataFrame accessing previous elements

Answers (2)

Related Questions