Reputation: 534
I am trying to find the correlation coefficient in R between my dependent and independent variable.
data("mtcars")
my_data <- mtcars[, c(1,3,4,5,6,7)]
res <- cor(my_data)
round(res, 2)
As a result, I got a correlation matrix, some with +ve or -ve.
For ex: if correlation coefficient between mpg and disp is -0.85, how can I know which variable is decreasing and the one increasing?
Upvotes: 2
Views: 1149
Reputation: 93791
Another way to think about this is that a correlation coefficient of -0.85 tells you that a one standard deviation increase(decrease) in either variable is associated with an 0.85 standard deviation decrease(increase) in the other variable. You can see this graphically using the code below.
The black line is the regression line for a regression of disp
vs. mpg
. This is related to the correlation coefficient because the regression slope equals the correlation coefficient times the standard deviation of disp
divided by the standard deviation of mpg
. (If we had switched the x and y variables and done lm(mpg ~ disp, data=mtcars)
, then the regression slope would be the correlation coefficient times the standard deviation of mpg
divided by the standard deviation of disp
.)
plot(mtcars$mpg, mtcars$disp)
abline(lm(disp ~ mpg, data=mtcars))
abline(v=mean(mtcars$mpg) + c(0, sd(mtcars$mpg)), col="red", lty="11")
abline(h=mean(mtcars$disp) + c(0, cor(mtcars$mpg, mtcars$disp)*sd(mtcars$disp)), col="red", lty="11")
You can standardize both variables (that is, scale the values so that they are in units of standard deviations away from the mean), which might make the relationship more clear. Now the correlation coefficient and the regression slope are exactly the same because both variables have been scaled to be in the same units. Note that a 1 standard deviation change in mpgS
is associated with a -0.85 standard deviation change in dispS
:
# Standardized versions of mpg and disp
mtcars$mpgS = (mtcars$mpg - mean(mtcars$mpg))/sd(mtcars$mpg)
mtcars$dispS = (mtcars$disp - mean(mtcars$disp))/sd(mtcars$disp)
plot(mtcars$mpgS, mtcars$dispS)
abline(lm(dispS ~ mpgS, data=mtcars))
abline(v=c(0,1), col="red", lty="11")
abline(h=c(0, cor(mtcars$mpg, mtcars$disp)), col="red", lty="11")
You can also reverse the roles of mpg
and disp
in the graph and the result is equivalent:
plot(mtcars$dispS, mtcars$mpgS)
abline(lm(mpgS ~ dispS, data=mtcars))
abline(v=c(0,1), col="red", lty="11")
abline(h=c(0, cor(mtcars$mpg, mtcars$disp)), col="red", lty="11")
Bear in mind that the relationship implied by the correlation coefficient is based on the assumption of a linear relationship, as embodied by the regression lines in the graphs. If the relationship in the actual data is not linear (as appears to be the case here), the correlation coefficient (or, equivalently, a single variable regression) might not provide good predictions of the values of the independent variable.
Upvotes: 3
Reputation: 521093
Consider the following script, which just compares mpg
and disp
:
res1 <- cor(mtcars$mpg, mtcars$disp)
res2 <- cor(mtcars$disp, mtcars$mpg)
round(res1, 2)
round(res2, 2)
The output from both calls is -0.85
. In other words, the nature of the correlation coefficient is not about the order of one variable against the other. Rather, a negative correlation coefficient means that as mpg
increases, disp
tends to decrease. And we could also phrase this by saying that as disp
increases, mpg
tends to decrease.
Upvotes: 2