Reputation: 537

Why correlation of two matrix return nan?

Consider the following code

import numpy as np
from scipy.stats.stats import pearsonr
A = np.ones([5,5])
B = np.ones([5,5])
pearsonr(A.flatten(), B.flatten())

now my question is why last line of code returns:

(nan, 1.0)

Upvotes: 1

Answers (1)

Anil_M

Reputation: 11453

When I run your code it's giving me following error. Are you not getting this error?

    In [6]:runfile('C:/Users/a*/.spyder-py3/temp.py', wdir='C:/Users/a*/.spyder-py3')
    (nan, 1.0)
    C:\Anaconda3\lib\site-packages\scipy\stats\stats.py:3029:  
  RuntimeWarning: invalid value encountered in double_scalars
      r = r_num / r_den
    C:\Anaconda3\lib\site-packages\scipy\stats\stats.py:5084:  
RuntimeWarning: invalid value encountered in less
      x = np.where(x < 1.0, x, 1.0)  # if x > 1 then return 1.0

When I dig deeper by opening stats.py, in peasronr function, It appears that the value of r_den is equal to zero, which is causing zero division error for value or r (Pearson’s correlation coefficient) on line line 2558 and hence r = NaN in the code.

Ideally r needs to be zero since correlation between the two identical sets (having identical data) is zero.

You can try out using your own function for similar data as below. Sourced from this SO posting

import numpy as np
from scipy.stats.stats import pearsonr
import warnings

def pearsonr(X, Y):
    ''' Takes X & Y as numpy array
       returms Pearson Correlation Coefficient 
    '''
    # Normalise X and Y
    X -= X.mean(0)
    Y -= Y.mean(0)
    # Standardise X and Y
    X /= X.std(0)
    Y /= Y.std(0)
    # Compute mean product
    return np.mean(X*Y)

A = np.ones([5,5]).flatten()
B = np.ones([5,5]).flatten()
print pearsonr(A, B)

Still gives exact same as error as stats.py pearsonr function.
Also notice that return value for r = NaN.

 In [7]: runfile('C:/Users/a*/.spyder-py3/temp.py', wdir='C:/Users/a*/.spyder-py3')
nan
C:/Users/a*/.spyder-py3/temp.py:14:   
RuntimeWarning: invalid value encountered in true_divide
  X /= X.std(0)
C:/Users/amandr/.spyder-py3/temp.py:15:  
RuntimeWarning: invalid value encountered in true_divide
  Y /= Y.std(0)

You can override the warning by catching it using try - except to return zero for r value in case of identical values.

More on Pearson's correlation here and here.

Upvotes: 1

Why correlation of two matrix return nan?

Answers (1)

Related Questions