Reputation: 9830
The cor()
function fails to compute the correlation value if there are extremely big numbers in the vector and returns just zero:
foo <- c(1e154, 1, 0)
bar <- c(0, 1, 2)
cor(foo, bar)
# -0.8660254
foo <- c(1e155, 1, 0)
cor(foo, bar)
# 0
Although 1e155
is very big, it's much smaller than the maximum number R can deal with. It's surprising for me why R returns a wrong value and does not return a more suitable result like NA
or Inf
.
Is there any reason for that? How to be sure we will not face such a situation in our programs?
Upvotes: 2
Views: 283
Reputation: 42659
Pearson's correlation coefficient between two variables is defined as the covariance of the two variables divided by the product of their standard deviations. (from http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient)
foo <- c(1e154, 1, 0)
sd(foo)
## [1] 5.773503e+153
foo <- c(1e155, 1, 0)
sd(foo)
## [1] Inf
And, even more fundamental, to calculate sd()
you need to take the square of x:
1e154^2
[1] 1e+308
1e155^2
[1] Inf
So, your number is indeed at the boundary of what is possible to calculate using 64 bits.
Using R-2.15.2 on Windows I get:
cor(c(1e555, 1, 0), 1:3)
[1] NaN
Upvotes: 7