Reputation: 325

In python, How do we find the Correlation Coefficient between two matrices?

I have got two matrices say, T1 and T2 each of size mxn. I want to find the correlation coefficient between two matrices
So far I haven't used any built-in library function for it. I am doing the following steps for it:
First I calculate the mean of the two matrices as:

M1 = T1.mean()
M2 = T2.mean()

and then I subtract the mean from the corresponding matrices as:

A = np.subtract(T1, M1)
B = np.subtract(T2, M2)

where np is the numpy library and A and B are the resulting matrices after doing the subtraction.
Now , I calculate the correlation coefficent as:

alpha = np.sum(A*B) / (np.sqrt((np.sum(A))*np.sum(B)))

However, the value i get is far greater than 1 and in not meaningful at all. It should be in between 0 and 1 to get some meaning out of it.
I have also tried to make use absolute values of matrix A and B, but that also did'nt work.
I also tried to use :

np.sum(np.dot(A,B.T)) instead of np.sum(A*B)

in the numerator , but that also didn't work.
Edit1:
This is the formula that I intend to calculate:

In this image, C is one of the matrices and T is another one.
'u' is the mean symbol.

Can somebody tell me where actually i am doing the mistake.

Upvotes: 3

Answers (3)

Roger V.

Reputation: 620

From the way the problem is described in the OP, the matrices are treated as arrays, so one could simply flatten them:

x = T1.flatten()
y = T2.flatten()

One could then use either the builtin numpy function proposed by @AakashMakwana:

import numy as np
r = np.corrcoef(x, y)[0,1]

Remark: Note that without flattening this solution would produce the matrix of pairwise correlations.

Alternatively, one could use the equivalent scipy function:

from scipy.stats import pearsonr
r = pearsonr(x,y)[0]

Scipy additionally provides possibility of calculating Spearman correlation coefficient (spearmanr(x,y)[0]) or Kendall tau (kendalltau(x,y)[0]).

Upvotes: 0

aamer aamer

Reputation: 325

Well I think this function is doing what I intend for:

def correlation_coefficient(T1, T2):
    numerator = np.mean((T1 - T1.mean()) * (T2 - T2.mean()))
    denominator = T1.std() * T2.std()
    if denominator == 0:
        return 0
    else:
        result = numerator / denominator
        return result

The calculation of numerator seems to be tricky here which doesn't exactly reflect the formula shown in the above image and denominator is just the product of standard deviations of the two images.
However, the result does make a sense now as the result lies only in between 0 and 1.

Upvotes: 0

Aakash Makwana

Reputation: 754

Can you try this:

import numpy as np
x = np.array([[0.1, .32, .2, 0.4, 0.8], [.23, .18, .56, .61, .12]])
y = np.array([[2,4,0.1, .32, .2],[1,3,.23, .18, .56]])
pearson = np.corrcoef(x,y)
print(pearson)

Upvotes: 1

In python, How do we find the Correlation Coefficient between two matrices?

Answers (3)

Related Questions