Reputation:
I have daily returns from three markets (GLD, SPY, and USO). My goal is to calculate the the average pairwise correlation from a correlation matrix on a rolling basis of 130 days.
My starting point was:
import numpy as np
import pandas as pd
import os as os
import pandas.io.data as web
import datetime as datetime
from pandas.io.data import DataReader
stocks = ['spy', 'gld', 'uso']
start = datetime.datetime(2010,1,1)
end = datetime.datetime(2016,1,1)
df = web.DataReader(stocks, 'yahoo', start, end)
adj_close_df = df['Adj Close']
returns = adj_close_df.pct_change(1).dropna()
returns = returns.dropna()
rollingcor = returns.rolling(130).corr()
This creates a panel of correlation matrices. However, extracting the lower(or upper) triangles, removing the diagonals and then calculating the average for each observation is where I've drawn a blank. Ideally I would like the output for each date to be in a Series where I can then index it by the dates.
Maybe I've started from the wrong place but any help would be appreciated.
Upvotes: 6
Views: 6578
Reputation: 294506
You could use numpy
's tril
to access the lower triangle of the dataframe.
def tril_sum(df):
# -1 ensures we skip the diagonal
return np.tril(df.unstack().values, -1).sum()
Calculates the sum of the lower triangle of the matrix. Notice the unstack()
in the middle of that. I'm expecting to have a multiindex series that I'll need to pivot to a dataframe.
Then apply it to your panel
n = len(stock)
avg_cor = rollingcor.dropna().to_frame().apply(tril_sum) / ((n ** 2 - n) / 2)
Looks like:
print avg_cor.head()
Date
2010-07-12 0.398973
2010-07-13 0.403664
2010-07-14 0.402483
2010-07-15 0.403252
2010-07-16 0.407769
dtype: float64
This answer skips the diagonals.
Upvotes: 0
Reputation: 3481
To get the average pairwise correlation, you can find the sum of the correlation matrix, substract n
(ones on the diagonal), divide by 2 (symmetry), and finally divide by n
(average). I think this should do it:
>>> n = len(stocks)
>>> ((rollingcor.sum(skipna=0).sum(skipna=0) - n) / 2) / n
Date
2010-01-05 NaN
2010-01-06 NaN
2010-01-07 NaN
...
2015-12-29 0.164356
2015-12-30 0.168102
2015-12-31 0.166462
dtype: float64
Upvotes: 4