Atticus29
Atticus29

Reputation: 4412

A matrix version of cor.test()

Cor.test() takes vectors x and y as arguments, but I have an entire matrix of data that I want to test, pairwise. Cor() takes this matrix as an argument just fine, and I'm hoping to find a way to do the same for cor.test().

The common advice from other folks seems to be to use cor.prob():

https://stat.ethz.ch/pipermail/r-help/2001-November/016201.html

But these p-values are not the same as those generated by cor.test()!!! Cor.test() also seems better equipped to handle pairwise deletion (I have quite a bit of missing data in my data set) than cor.prob().

Does anybody have any alternatives to cor.prob()? If the solution involves nested for loops, so be it (I'm new enough to R for even this to be problematic for me).

Upvotes: 34

Views: 46932

Answers (5)

user10412376
user10412376

Reputation: 1

"The accepted solution (corr.test function in the psych package) works, but is extremely slow for large matrices."

If you use ci=FALSE, then the speed is much faster. By default, confidence intervals are found. However, this leads to a slight slowdown of speed. So, for just the rs, ts and ps, set ci=FALSE.

Upvotes: -1

Nick Clark
Nick Clark

Reputation: 141

The accepted solution (corr.test function in the psych package) works, but is extremely slow for large matrices. I was working with a gene expression matrix (~20,000 by ~1,000) correlated to a drug sensitivity matrix (~1,000 by ~500) and I had to stop it because it was taking forever.

I took some code from the psych package and used the cor() function directly instead and got much better results:

# find (pairwise complete) correlation matrix between two matrices x and y
# compare to corr.test(x, y, adjust = "none")
n <- t(!is.na(x)) %*% (!is.na(y)) # same as count.pairwise(x,y) from psych package
r <- cor(x, y, use = "pairwise.complete.obs") # MUCH MUCH faster than corr.test()
cor2pvalue = function(r, n) {
  t <- (r*sqrt(n-2))/sqrt(1-r^2)
  p <- 2*(1 - pt(abs(t),(n-2)))
  se <- sqrt((1-r*r)/(n-2))
  out <- list(r, n, t, p, se)
  names(out) <- c("r", "n", "t", "p", "se")
  return(out)
}
# get a list with matrices of correlation, pvalues, standard error, etc.
result = cor2pvalue(r,n)

Even with two 100 x 200 matrices, the difference was staggering. A second or two versus 45 seconds.

> system.time(test_func(x,y))
   user  system elapsed 
  0.308   2.452   0.130 
> system.time(corr.test(x,y, adjust = "none"))
   user  system elapsed 
 45.004   3.276  45.814 

Upvotes: 5

Sacha Epskamp
Sacha Epskamp

Reputation: 47551

corr.test in the psych package is designed to do this:

library("psych")
data(sat.act)
corr.test(sat.act)

As noted in the comments, to replicate the p-values from the base cor.test() function over the entire matrix, then you need to turn off adjustment of the p-values for multiple comparisons (the default is to use Holm's method of adjustment):

 corr.test(sat.act, adjust = "none")

[But be careful when interpreting those results!]

Upvotes: 50

CoderGuy123
CoderGuy123

Reputation: 6649

Probably the easiest way is to use the rcorr() from Hmisc. It will only take a matrix, so use rcorr(as.matrix(x)) if your data is in a data.frame. It will return you a list with: 1) matrix of r pairwise, 2) matrix of pairwise n, 3) matrix of p values for the r's. It automatically ignores missing data.

Ideally, a function of this kind should take data.frames too and also output confidence intervals in line with the 'New Statistics'.

Upvotes: 8

Tyler Rinker
Tyler Rinker

Reputation: 109874

If you're strictly after the pvalues in a matrix format from cor.test here's a solution shamelessly stolen from Vincent (LINK):

cor.test.p <- function(x){
    FUN <- function(x, y) cor.test(x, y)[["p.value"]]
    z <- outer(
      colnames(x), 
      colnames(x), 
      Vectorize(function(i,j) FUN(x[,i], x[,j]))
    )
    dimnames(z) <- list(colnames(x), colnames(x))
    z
}

cor.test.p(mtcars)

Note: Tommy also provides a faster solution though less easy to impliment. Oh and no for loops :)

Edit I have a function v_outer in my qdapTools package that makes this task pretty easy:

library(qdapTools)
(out <- v_outer(mtcars, function(x, y) cor.test(x, y)[["p.value"]]))
print(out, digits=4)  # for more digits

Upvotes: 15

Related Questions