Reputation: 537
Consider the following code
import numpy as np
from scipy.stats.stats import pearsonr
A = np.ones([5,5])
B = np.ones([5,5])
pearsonr(A.flatten(), B.flatten())
now my question is why last line of code returns:
(nan, 1.0)
Upvotes: 1
Views: 2042
Reputation: 11453
When I run your code it's giving me following error. Are you not getting this error?
In [6]:runfile('C:/Users/a*/.spyder-py3/temp.py', wdir='C:/Users/a*/.spyder-py3')
(nan, 1.0)
C:\Anaconda3\lib\site-packages\scipy\stats\stats.py:3029:
RuntimeWarning: invalid value encountered in double_scalars
r = r_num / r_den
C:\Anaconda3\lib\site-packages\scipy\stats\stats.py:5084:
RuntimeWarning: invalid value encountered in less
x = np.where(x < 1.0, x, 1.0) # if x > 1 then return 1.0
When I dig deeper by opening stats.py
, in peasronr
function, It appears that the value of r_den
is equal to zero, which is causing zero division error for value or r
(Pearson’s correlation coefficient) on line line 2558 and hence r = NaN
in the code.
Ideally r needs to be zero since correlation between the two identical sets (having identical data) is zero.
You can try out using your own function for similar data as below. Sourced from this SO posting
import numpy as np
from scipy.stats.stats import pearsonr
import warnings
def pearsonr(X, Y):
''' Takes X & Y as numpy array
returms Pearson Correlation Coefficient
'''
# Normalise X and Y
X -= X.mean(0)
Y -= Y.mean(0)
# Standardise X and Y
X /= X.std(0)
Y /= Y.std(0)
# Compute mean product
return np.mean(X*Y)
A = np.ones([5,5]).flatten()
B = np.ones([5,5]).flatten()
print pearsonr(A, B)
Still gives exact same as error as stats.py
pearsonr
function.
Also notice that return value for r = NaN.
In [7]: runfile('C:/Users/a*/.spyder-py3/temp.py', wdir='C:/Users/a*/.spyder-py3')
nan
C:/Users/a*/.spyder-py3/temp.py:14:
RuntimeWarning: invalid value encountered in true_divide
X /= X.std(0)
C:/Users/amandr/.spyder-py3/temp.py:15:
RuntimeWarning: invalid value encountered in true_divide
Y /= Y.std(0)
You can override the warning by catching it using try - except
to return zero for r
value in case of identical values.
More on Pearson's correlation here and here.
Upvotes: 1