Reputation: 229
I have a total of 4 columns in my dataframe. I would like to generate the Covariance between every row and its 1st lag row. How do I do the same? A subset of my dataframe along with the desired column 'Covariance' is given below:
A1 A2 A3 A4 Covariance
24.54 63.32 47.12 60.66 nan
16.33 64.39 49.72 56.94 73.43
46.38 55.94 42.35 73.43 50.27
Basically, the Covariance column should contain the value as if the COVARIANCE.P formula was used between a row and its lag in Microsoft EXCEL. As of now, I am generating new columns with the lag variables and trying to obtain the Covariance along a row of the data frame. This is the code that I have:
df = pd.DataFrame({'A1':[19, 20, 21],
'A2':[24.54, 16.33, 46.38],
'A3':[63.32, 64.39, 55.94],
'A4':[47.12, 49.72, 42.35]})
df['Lag_A1'] = df['A1].shift(-1)
df['Lag_A2'] = df['A2].shift(-1)
df['Lag_A3'] = df['A3].shift(-1)
df['Lag_A4'] = df['A4].shift(-1)
Main = ['A1', 'A2', 'A3', 'A4']
Lag_1 = ['Lag_1_A1', 'Lag_1_A2', 'Lag_1_A3', 'Lag_1_A4']
Lag_1_Cov = df.apply(lambda x: x[Main].T.cov(x[Lag_1].T))
However, this code does not work. Please help.
Upvotes: 0
Views: 189
Reputation: 164823
You can use numpy
for this calculation. However, I cannot replicate your covariance values.
Note that we set ddof = 0
for population covariance.
import numpy as np
arr = df.values
df['Covariance'] = [np.nan] + [np.cov(i, j, ddof=0)[0][1] for i, j in zip(arr, arr[1:])]
Result:
A1 A2 A3 A4 Covariance
0 24.54 63.32 47.12 60.66 NaN
1 16.33 64.39 49.72 56.94 277.486000
2 46.38 55.94 42.35 73.43 107.303425
Upvotes: 0
Reputation: 323376
If I understand.
df['Cov']=[df.iloc[:,:4].iloc[x].cov(df.iloc[:,:4].shift(1).iloc[x]) for x in range(len(df))]
df
Out[147]:
A1 A2 A3 A4 Covariance Cov
0 24.54 63.32 47.12 60.66 NaN NaN
1 16.33 64.39 49.72 56.94 73.43 369.981333
2 46.38 55.94 42.35 73.43 50.27 143.071233
Upvotes: 1