Gautham Kanthasamy
Gautham Kanthasamy

Reputation: 229

Generate Covariance in Pandas as a New Value

I have a total of 4 columns in my dataframe. I would like to generate the Covariance between every row and its 1st lag row. How do I do the same? A subset of my dataframe along with the desired column 'Covariance' is given below:

 A1      A2      A3      A4      Covariance 
24.54   63.32   47.12   60.66      nan
16.33   64.39   49.72   56.94     73.43
46.38   55.94   42.35   73.43     50.27

Basically, the Covariance column should contain the value as if the COVARIANCE.P formula was used between a row and its lag in Microsoft EXCEL. As of now, I am generating new columns with the lag variables and trying to obtain the Covariance along a row of the data frame. This is the code that I have:

df = pd.DataFrame({'A1':[19, 20, 21],
               'A2':[24.54, 16.33, 46.38],
               'A3':[63.32, 64.39, 55.94],
               'A4':[47.12, 49.72, 42.35]})
df['Lag_A1'] = df['A1].shift(-1)
df['Lag_A2'] = df['A2].shift(-1)
df['Lag_A3'] = df['A3].shift(-1)
df['Lag_A4'] = df['A4].shift(-1)
Main = ['A1', 'A2', 'A3', 'A4']
Lag_1 = ['Lag_1_A1', 'Lag_1_A2', 'Lag_1_A3', 'Lag_1_A4']
Lag_1_Cov = df.apply(lambda x: x[Main].T.cov(x[Lag_1].T))

However, this code does not work. Please help.

Upvotes: 0

Views: 189

Answers (2)

jpp
jpp

Reputation: 164823

You can use numpy for this calculation. However, I cannot replicate your covariance values.

Note that we set ddof = 0 for population covariance.

import numpy as np

arr = df.values
df['Covariance'] = [np.nan] + [np.cov(i, j, ddof=0)[0][1] for i, j in zip(arr, arr[1:])]

Result:

      A1     A2     A3     A4  Covariance
0  24.54  63.32  47.12  60.66         NaN
1  16.33  64.39  49.72  56.94  277.486000
2  46.38  55.94  42.35  73.43  107.303425

Upvotes: 0

BENY
BENY

Reputation: 323376

If I understand.

df['Cov']=[df.iloc[:,:4].iloc[x].cov(df.iloc[:,:4].shift(1).iloc[x]) for x in range(len(df))]
df
Out[147]: 
      A1     A2     A3     A4  Covariance         Cov
0  24.54  63.32  47.12  60.66         NaN         NaN
1  16.33  64.39  49.72  56.94       73.43  369.981333
2  46.38  55.94  42.35  73.43       50.27  143.071233

Upvotes: 1

Related Questions