Reputation: 95
I am trying to calculate the pearson correlation between two vectors of data.
x = c(5,5,4,5,5,5)
y = c(0,5,0,3,5,4)
mx = mean(x)
my = mean(y)
newx = c(x-mx)
newy = c(y-my)
corr = (newx%*%t(newy)/sqrt((newx^2)%*%(sqrt(newy^2)))
My first major issue is that this correlation is calculated by ignoring 0 values. However, I do not believe my final calculation would be possible if I were to omit them entirely.
If you know of a more elegant way to code this, or what I am doing incorrectly, I would greatly appreciate it.
Upvotes: 0
Views: 596
Reputation: 18663
You've got a couple of errors. First, you're missing a closing parenthesis. Second, the numerator is backwards. You want the transpose of the first component, not the second. And you forgot to sum the denominators.
c(t(newx) %*% newy) / (sqrt(sum(newx^2)) * sqrt(sum(newy^2)))
#[1] 0.5991713
cor(x, y)
#[1] 0.5991713
Alternatively, you can use crossprod
.
crossprod(newx, newy) / (sqrt(sum(newx^2)) * sqrt(sum(newy^2)))
[,1]
[1,] 0.5991713
Upvotes: 2