sara
sara

Reputation: 534

How to interpret correlation coefficient

I am trying to find the correlation coefficient in R between my dependent and independent variable.

data("mtcars")
my_data <- mtcars[, c(1,3,4,5,6,7)]
res <- cor(my_data)
round(res, 2)

As a result, I got a correlation matrix, some with +ve or -ve.

For ex: if correlation coefficient between mpg and disp is -0.85, how can I know which variable is decreasing and the one increasing?

Upvotes: 2

Views: 1149

Answers (2)

eipi10
eipi10

Reputation: 93791

Another way to think about this is that a correlation coefficient of -0.85 tells you that a one standard deviation increase(decrease) in either variable is associated with an 0.85 standard deviation decrease(increase) in the other variable. You can see this graphically using the code below.

The black line is the regression line for a regression of disp vs. mpg. This is related to the correlation coefficient because the regression slope equals the correlation coefficient times the standard deviation of disp divided by the standard deviation of mpg. (If we had switched the x and y variables and done lm(mpg ~ disp, data=mtcars), then the regression slope would be the correlation coefficient times the standard deviation of mpg divided by the standard deviation of disp.)

plot(mtcars$mpg, mtcars$disp)
abline(lm(disp ~ mpg, data=mtcars))
abline(v=mean(mtcars$mpg) + c(0, sd(mtcars$mpg)), col="red", lty="11")
abline(h=mean(mtcars$disp) + c(0, cor(mtcars$mpg, mtcars$disp)*sd(mtcars$disp)), col="red", lty="11")

enter image description here

You can standardize both variables (that is, scale the values so that they are in units of standard deviations away from the mean), which might make the relationship more clear. Now the correlation coefficient and the regression slope are exactly the same because both variables have been scaled to be in the same units. Note that a 1 standard deviation change in mpgS is associated with a -0.85 standard deviation change in dispS:

# Standardized versions of mpg and disp
mtcars$mpgS = (mtcars$mpg - mean(mtcars$mpg))/sd(mtcars$mpg)
mtcars$dispS = (mtcars$disp - mean(mtcars$disp))/sd(mtcars$disp)

plot(mtcars$mpgS, mtcars$dispS)
abline(lm(dispS ~ mpgS, data=mtcars))
abline(v=c(0,1), col="red", lty="11")
abline(h=c(0, cor(mtcars$mpg, mtcars$disp)), col="red", lty="11")

enter image description here

You can also reverse the roles of mpg and disp in the graph and the result is equivalent:

plot(mtcars$dispS, mtcars$mpgS)
abline(lm(mpgS ~ dispS, data=mtcars))
abline(v=c(0,1), col="red", lty="11")
abline(h=c(0, cor(mtcars$mpg, mtcars$disp)), col="red", lty="11")

enter image description here

Bear in mind that the relationship implied by the correlation coefficient is based on the assumption of a linear relationship, as embodied by the regression lines in the graphs. If the relationship in the actual data is not linear (as appears to be the case here), the correlation coefficient (or, equivalently, a single variable regression) might not provide good predictions of the values of the independent variable.

Upvotes: 3

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521093

Consider the following script, which just compares mpg and disp:

res1 <- cor(mtcars$mpg,  mtcars$disp)
res2 <- cor(mtcars$disp, mtcars$mpg)
round(res1, 2)
round(res2, 2)

The output from both calls is -0.85. In other words, the nature of the correlation coefficient is not about the order of one variable against the other. Rather, a negative correlation coefficient means that as mpg increases, disp tends to decrease. And we could also phrase this by saying that as disp increases, mpg tends to decrease.

Upvotes: 2

Related Questions