Reputation: 8753
Could you please explain me the difference between these two cases?
> cor(1:10, rep(10,10))
[1] NA
Warning message:
In cor(1:10, rep(10, 10)) : the standard deviation is zero
> cor(1:10, 1:10)
[1] 1
the first one is just a straight line as well as the second I would expect the correlation to be one. What am I not considering? Thanks
Upvotes: 0
Views: 422
Reputation: 32351
If you want to measure how much "in line" the points are, you can use (one minus) the ratio of the eigenvalues of the variance matrix.
f <- function(x,y) {
e <- eigen(var(cbind(x,y)))$values
1 - e[2] / e[1]
}
# To have values closer to 0, you can square that quantity.
f <- function(x,y) {
e <- eigen(var(cbind(x,y)))$values
( 1 - e[2] / e[1] )^2
}
f( 1:10, 1:10 )
f( 1:10, rep(1,10) )
f( rnorm(100), rnorm(100) ) # Close to 0
f( rnorm(100), 2 * rnorm(100) ) # Closer to 1
f( 2 * rnorm(100), rnorm(100) ) # Similar
It will be 1 if the points are aligned, 0 if the cloud they form has a spherical shape, invariant by translations and rotations, non-negative, and symmetric.
If your situation is not symmetric, i.e., if x
and y
do not play the same role,
the regression-based approach suggested in Roland's comment makes more sense.
Upvotes: 1
Reputation: 60462
Plot the data and it should be clear. The data set
## y doesn't vary
plot(1:10, rep(10,10))
is just a horizontal line. The correlation coefficient undefined for a horizontal line, since the estimate of the standard deviation for y
is 0 (this appears on the denominator of the correlation coefficient). While
plot(1:10, 1:10)
is the line:
y = x
Upvotes: 2