Efficient calculation across dictionary consisting of thousands of correlation matrizes

Question

Based on a large dataset of daily observations from 20 assets, I created a dictionary which comprises (rolling) correlation matrices. I am using the date index as a key for the dictionary.

What I want to do now (in an efficient manner) is to compare all correlation matrizes within the dictionary and save the result in a new matrix. The idea is to compare correlation structures over time.

import pandas as pd
import numpy as np
from scipy.cluster.hierarchy import linkage
from scipy.cluster.hierarchy import cophenet


key_list = dict_corr.keys()

# Create empty matrix
X = np.empty(shape=[len(key_list),len(key_list)])

key1_index = 0
key2_index = 0
for key1 in key_list:


    # Extract correlation matrix from dictionary
    corr1_temp = d[key1]

    # Transform correlation matrix into distance matrix
    dist1_temp = ((1-corr1_temp)/2.)**.5

    # Extract hierarchical structure from distance matrix
    link1_temp = linkage(dist1_temp,'single') 

    for key2 in key_list:

        corr2_temp = d[key2]
        dist2_temp = ((1-corr2_temp)/2.)**.5
        link2_temp = linkage(dist2_temp,'single')

        # Compare hierarchical structure between the two correlation matrizes -> results in 2x2 matrix
        temp = np.corrcoef(cophenet(link1_temp),cophenet(link2_temp))

        # Extract from the resulting 2x2 matrix the correlation
        X[key1_index, key2_index] = temp[1,0]

        key2_index =+ 1

    key1_index =+1

I'm well aware of the fact that using two for loops is probably the least efficient way to do it.

So I'm grateful for any helpful comment how to speed up the calculations!

Best

DavideBrex · Accepted Answer

You can look at itertools and then insert your code to compute the correlation within a function (compute_corr) called in the single for loop:

import itertools
for key_1, key_2 in itertools.combinations(dict_corr, 2):
    correlation = compute_corr(key_1, key_2, dict_corr)
    #now store correlation in a list

If you care about the order use itertools.permutations(dict_corr, 2) instead of combinations.

EDIT

Since you want all possible combination of keys (also a key with itself), you should use itertools.product.

l_corr = [] #list to store all the output from the function
for key_1, key_2 in itertools.product(key_list, repeat= 2 ):
    l_corr.append(compute_corr(key_1, key_2, dict_corr))

Now l_corr will be long: len(key_list)*len(key_list). You can convert this list to a matrix in this way:

np.array(l_corr).reshape(len(key_list),len(key_list))

Dummy example:

def compute_corr(key_1, key_2, dict_corr):
    return key_1 * key_2 #dummy result from the function

dict_corr={1:"a",2:"b",3:"c",4:"d",5:"f"}
key_list = dict_corr.keys()

l_corr = []
for key_1, key_2 in itertools.product(key_list, repeat= 2 ):
    print(key_1, key_2)
    l_corr.append(compute_corr(key_1, key_2, dict_corr))

Combinations:

Create the final matrix:

np.array(l_corr).reshape(len(key_list),len(key_list))

array([[ 1,  2,  3,  4,  5],
       [ 2,  4,  6,  8, 10],
       [ 3,  6,  9, 12, 15],
       [ 4,  8, 12, 16, 20],
       [ 5, 10, 15, 20, 25]])

Let me know in case I missed something. Hope this may help you

Efficient calculation across dictionary consisting of thousands of correlation matrizes

Answers (1)

Related Questions