Reputation: 11
I am trying to find a fast way to calculate the correlation between a vector of values and a matrix. I have a data frame with 200 rows and 400,000 observations after transposing the data. I need to find the cor between each column and every other column.
My code is below but it is too slow. Can anyone come up with a faster way.
for(i in 1:400000){
x=cor(trainDataNew[,i],trainDataNew[,-i])
}
You don't need my data to do this. You can create random data like below.
norm1 <- rnorm(1000)
norm2 <- rnorm(1000)
norm3 <- rnorm(1000)
as.data.frame(cbind(norm1,norm2,norm3))
Upvotes: 1
Views: 519
Reputation: 226182
What's wrong with
cc <- cor(trainDataNew)
?
If you only want the lower triangle you can then use
cc2 <- cc[lower.tri(cc,diag=FALSE)]
This blog post claims to have done a similar-sized (slightly smaller) problem in about a minute. Their approach is implemented in HiClimR::fastCor
.
library(HiClimR)
system.time(cc <- fastCor(dd, nSplit = 10,
upperTri = TRUE, verbose = TRUE,
optBLAS=TRUE))
I haven't gotten this working yet (keep running out of memory), but you may have better luck. You should also look into linking R to an optimized BLAS, e.g. see here for MacOS.
Someone here reports a parallelized version (code is here, along with some forked versions)
Upvotes: 2