Michele
Michele

Reputation: 8753

NA from correlation function

Could you please explain me the difference between these two cases?

> cor(1:10, rep(10,10))
[1] NA
Warning message:
In cor(1:10, rep(10, 10)) : the standard deviation is zero

> cor(1:10, 1:10)
[1] 1

the first one is just a straight line as well as the second I would expect the correlation to be one. What am I not considering? Thanks

Upvotes: 0

Views: 422

Answers (2)

Vincent Zoonekynd
Vincent Zoonekynd

Reputation: 32351

If you want to measure how much "in line" the points are, you can use (one minus) the ratio of the eigenvalues of the variance matrix.

f <- function(x,y) { 
  e <- eigen(var(cbind(x,y)))$values
  1 - e[2] / e[1]
}

# To have values closer to 0, you can square that quantity.
f <- function(x,y) { 
  e <- eigen(var(cbind(x,y)))$values
  ( 1 - e[2] / e[1] )^2
}
f( 1:10, 1:10 )
f( 1:10, rep(1,10) )
f( rnorm(100), rnorm(100) )     # Close to 0
f( rnorm(100), 2 * rnorm(100) ) # Closer to 1
f( 2 * rnorm(100), rnorm(100) ) # Similar

It will be 1 if the points are aligned, 0 if the cloud they form has a spherical shape, invariant by translations and rotations, non-negative, and symmetric.

If your situation is not symmetric, i.e., if x and y do not play the same role, the regression-based approach suggested in Roland's comment makes more sense.

Upvotes: 1

csgillespie
csgillespie

Reputation: 60462

Plot the data and it should be clear. The data set

## y doesn't vary
plot(1:10, rep(10,10))

is just a horizontal line. The correlation coefficient undefined for a horizontal line, since the estimate of the standard deviation for y is 0 (this appears on the denominator of the correlation coefficient). While

plot(1:10, 1:10)

is the line:

y = x

enter image description here

Upvotes: 2

Related Questions