user4533817
user4533817

Reputation:

Rolling average pairwise correlation in Python

I have daily returns from three markets (GLD, SPY, and USO). My goal is to calculate the the average pairwise correlation from a correlation matrix on a rolling basis of 130 days.

My starting point was:

import numpy as np
import pandas as pd
import os as os
import pandas.io.data as web
import datetime as datetime
from pandas.io.data import DataReader

stocks = ['spy', 'gld', 'uso']
start = datetime.datetime(2010,1,1)
end = datetime.datetime(2016,1,1)

df = web.DataReader(stocks, 'yahoo', start, end)
adj_close_df = df['Adj Close']

returns = adj_close_df.pct_change(1).dropna()
returns = returns.dropna()

rollingcor = returns.rolling(130).corr()

This creates a panel of correlation matrices. However, extracting the lower(or upper) triangles, removing the diagonals and then calculating the average for each observation is where I've drawn a blank. Ideally I would like the output for each date to be in a Series where I can then index it by the dates.

Maybe I've started from the wrong place but any help would be appreciated.

Upvotes: 6

Views: 6578

Answers (2)

piRSquared
piRSquared

Reputation: 294506

You could use numpy's tril to access the lower triangle of the dataframe.

def tril_sum(df):
    # -1 ensures we skip the diagonal
    return np.tril(df.unstack().values, -1).sum()

Calculates the sum of the lower triangle of the matrix. Notice the unstack() in the middle of that. I'm expecting to have a multiindex series that I'll need to pivot to a dataframe.

Then apply it to your panel

n = len(stock)
avg_cor = rollingcor.dropna().to_frame().apply(tril_sum) / ((n ** 2 - n) / 2)

Looks like:

print avg_cor.head()

Date
2010-07-12    0.398973
2010-07-13    0.403664
2010-07-14    0.402483
2010-07-15    0.403252
2010-07-16    0.407769
dtype: float64

This answer skips the diagonals.

Upvotes: 0

aldanor
aldanor

Reputation: 3481

To get the average pairwise correlation, you can find the sum of the correlation matrix, substract n (ones on the diagonal), divide by 2 (symmetry), and finally divide by n (average). I think this should do it:

>>> n = len(stocks)
>>> ((rollingcor.sum(skipna=0).sum(skipna=0) - n) / 2) / n
Date
2010-01-05         NaN
2010-01-06         NaN
2010-01-07         NaN
                ...   
2015-12-29    0.164356
2015-12-30    0.168102
2015-12-31    0.166462
dtype: float64

Upvotes: 4

Related Questions