Ruthger Righart
Ruthger Righart

Reputation: 4921

How to use `cor.test` for correlation of specific columns?

I have the following data example:

A<-rnorm(100)
B<-rnorm(100)
C<-rnorm(100)

v1<-as.numeric(c(1:100))
v2<-as.numeric(c(2:101))
v3<-as.numeric(c(3:102))
v2[50]<-NA
v3[60]<-NA
v3[61]<-NA

df<-data.frame(A,B,C,v1,v2,v3)

As you can see df has 1 NA in column 5, and 2 NA's in column 6. Now I would like to make a correlation matrix of col1 and 3 on the one hand, and col2,4,5,6 on the other. Using the cor function in R:

cor(df[ , c(1,3)], df[ , c(2,4,5,6)], use="complete.obs")

#             B         v1         v2         v3
# A -0.007565203 -0.2985090 -0.2985090 -0.2985090
# C  0.032485874  0.1043763  0.1043763  0.1043763

This works. I however wanted to have both estimate and p.value and therefore I switch to cor.test.

cor.test(df[ ,c(1,3)], df[ , c(2,4,5,6)], na.action = "na.exclude")$estimate

This does not work as 'x' and 'y' must have the same length. This error actually occurs with or without NA's in the data. It seems that cor.test does not understand (unlike cor) the request to correlate specific columns. Is there any solution to this problem?

Upvotes: 3

Views: 4438

Answers (1)

Backlin
Backlin

Reputation: 14872

You can use outer to perform the test between all pairs of columns. Here X and Y are data frames expanded from df, consisting of 8 columns each.

outer(df[, c(1,3)], df[, c(2,4,5,6)], function(X, Y){
    mapply(function(...) cor.test(..., na.action = "na.exclude")$estimate,
           X, Y)
})

You even get output on the same form as cor:

           B          v1          v2          v3
A 0.07844426  0.01829566  0.01931412  0.01528329
C 0.11487140 -0.14827859 -0.14900301 -0.15534569

Upvotes: 3

Related Questions