Ajay Singh
Ajay Singh

Reputation: 309

checking correlation between numeric and boolean in R

I'm beginner in R. I have learned how to check correlation between numeric data.

However I can not find details on how to check correlation between numeric and boolean type of data. Can anybody give me tips or guide me on this.

Thanks in advance!

Upvotes: 3

Views: 2642

Answers (2)

Sven Hohenstein
Sven Hohenstein

Reputation: 81693

I suppose you are looking for the point-biserial correlation. Download the package ltm. It includes the function biserial.cor.

x <- rnorm(10)
y <- rep(c(0,1), 5)

library(ltm)
biserial.cor(x,y)
#[1] -0.08279833

See ?biserial.cor for details.

The result is slightly different from the one obtained with the built-in cor function:

cor(x,y)
#[1] 0.0872771

Upvotes: 3

csgillespie
csgillespie

Reputation: 60462

This answers your question:

##x is logical, i.e. TRUE or FALSE
R> x = sample(c(T, F), 10, replace=10)
##y is numeric
R> y = runif(10)

##When we use correlation
##R converts TRUE to 1 and FALSE to 0.
R> cor(x, y)
[1] -0.5514

The obvious question is should you be doing this? Remember, correlation is testing for a linear relationship between x and y, i.e. as x increases y changes in a linear manner. This doesn't occur in your scenario. As the answer by @Sven indicate, you want to use the Point-biserial correlation method.


If you data is a character vector, say:

x = c("M", "F") 

then you would need to do an additional step:

x[x=="M"] = 1
x[x=="F"] = 1

Upvotes: 2

Related Questions