Elgin Cahangirov
Elgin Cahangirov

Reputation: 2022

Correlation table

Suppose that you have hundreds of numpy arrays and you want to calculate correlation between each of them. I calculated it with the help of nested for loops. But, execution took huge time(20 minutes!). One way to make this calculation more efficient is to calculate one half of the correlation table diagonal, copy it to other half and make diagonal line equal to 1. What I mean is that, correlation(x,y)=correlation(y,x) and correlation(x,x) is always equal to 1. However, with these corrections, code will also take much time(approx 7-8 minutes). Any other suggestions?

My code
for x in data_set:
    for y in data_set:
        correlation = np.corrcoef(x,y)[1][0]

Upvotes: 1

Views: 758

Answers (1)

FLab
FLab

Reputation: 7496

I am quite sure you can achieve must faster results by creating a 2-D array and calculating its correlation matrix (as opposed to calculate pair wise correlations one by one).

From numpy's corrcoef documentation the input can be: " 1-D or 2-D array containing multiple variables and observations. Each row of m represents a variable, and each column a single observation of all those variables." https://docs.scipy.org/doc/numpy/reference/generated/numpy.corrcoef.html

Upvotes: 1

Related Questions