Reputation: 4412
Cor.test()
takes vectors x
and y
as arguments, but I have an entire matrix of data that I want to test, pairwise. Cor()
takes this matrix as an argument just fine, and I'm hoping to find a way to do the same for cor.test()
.
The common advice from other folks seems to be to use cor.prob()
:
https://stat.ethz.ch/pipermail/r-help/2001-November/016201.html
But these p-values are not the same as those generated by cor.test()
!!! Cor.test()
also seems better equipped to handle pairwise deletion (I have quite a bit of missing data in my data set) than cor.prob()
.
Does anybody have any alternatives to cor.prob()
? If the solution involves nested for loops, so be it (I'm new enough to R
for even this to be problematic for me).
Upvotes: 34
Views: 46932
Reputation: 1
"The accepted solution (corr.test
function in the psych package) works, but is extremely slow for large matrices."
If you use ci=FALSE
, then the speed is much faster.
By default, confidence intervals are found. However, this leads to a slight slowdown of speed. So, for just the rs
, ts
and ps
, set ci=FALSE
.
Upvotes: -1
Reputation: 141
The accepted solution (corr.test function in the psych package) works, but is extremely slow for large matrices. I was working with a gene expression matrix (~20,000 by ~1,000) correlated to a drug sensitivity matrix (~1,000 by ~500) and I had to stop it because it was taking forever.
I took some code from the psych package and used the cor() function directly instead and got much better results:
# find (pairwise complete) correlation matrix between two matrices x and y
# compare to corr.test(x, y, adjust = "none")
n <- t(!is.na(x)) %*% (!is.na(y)) # same as count.pairwise(x,y) from psych package
r <- cor(x, y, use = "pairwise.complete.obs") # MUCH MUCH faster than corr.test()
cor2pvalue = function(r, n) {
t <- (r*sqrt(n-2))/sqrt(1-r^2)
p <- 2*(1 - pt(abs(t),(n-2)))
se <- sqrt((1-r*r)/(n-2))
out <- list(r, n, t, p, se)
names(out) <- c("r", "n", "t", "p", "se")
return(out)
}
# get a list with matrices of correlation, pvalues, standard error, etc.
result = cor2pvalue(r,n)
Even with two 100 x 200 matrices, the difference was staggering. A second or two versus 45 seconds.
> system.time(test_func(x,y))
user system elapsed
0.308 2.452 0.130
> system.time(corr.test(x,y, adjust = "none"))
user system elapsed
45.004 3.276 45.814
Upvotes: 5
Reputation: 47551
corr.test
in the psych
package is designed to do this:
library("psych")
data(sat.act)
corr.test(sat.act)
As noted in the comments, to replicate the p-values from the base cor.test()
function over the entire matrix, then you need to turn off adjustment of the p-values for multiple comparisons (the default is to use Holm's method of adjustment):
corr.test(sat.act, adjust = "none")
[But be careful when interpreting those results!]
Upvotes: 50
Reputation: 6649
Probably the easiest way is to use the rcorr()
from Hmisc. It will only take a matrix, so use rcorr(as.matrix(x))
if your data is in a data.frame. It will return you a list with: 1) matrix of r pairwise, 2) matrix of pairwise n, 3) matrix of p values for the r's. It automatically ignores missing data.
Ideally, a function of this kind should take data.frames too and also output confidence intervals in line with the 'New Statistics'.
Upvotes: 8
Reputation: 109874
If you're strictly after the pvalues in a matrix format from cor.test
here's a solution shamelessly stolen from Vincent (LINK):
cor.test.p <- function(x){
FUN <- function(x, y) cor.test(x, y)[["p.value"]]
z <- outer(
colnames(x),
colnames(x),
Vectorize(function(i,j) FUN(x[,i], x[,j]))
)
dimnames(z) <- list(colnames(x), colnames(x))
z
}
cor.test.p(mtcars)
Note: Tommy also provides a faster solution though less easy to impliment. Oh and no for loops :)
Edit I have a function v_outer
in my qdapTools
package that makes this task pretty easy:
library(qdapTools)
(out <- v_outer(mtcars, function(x, y) cor.test(x, y)[["p.value"]]))
print(out, digits=4) # for more digits
Upvotes: 15