Kyle
Kyle

Reputation: 11

What is the fastest way to convert correlation between a vector and a matrix in r?

I am trying to find a fast way to calculate the correlation between a vector of values and a matrix. I have a data frame with 200 rows and 400,000 observations after transposing the data. I need to find the cor between each column and every other column.

My code is below but it is too slow. Can anyone come up with a faster way.

for(i in 1:400000){
      x=cor(trainDataNew[,i],trainDataNew[,-i])
}

You don't need my data to do this. You can create random data like below.

norm1 <- rnorm(1000)
norm2 <- rnorm(1000)
norm3 <- rnorm(1000)
as.data.frame(cbind(norm1,norm2,norm3))

Upvotes: 1

Views: 519

Answers (1)

Ben Bolker
Ben Bolker

Reputation: 226182

What's wrong with

cc <- cor(trainDataNew)

?

If you only want the lower triangle you can then use

cc2 <- cc[lower.tri(cc,diag=FALSE)]

This blog post claims to have done a similar-sized (slightly smaller) problem in about a minute. Their approach is implemented in HiClimR::fastCor.

library(HiClimR)
system.time(cc <- fastCor(dd, nSplit = 10, 
        upperTri = TRUE, verbose = TRUE,
        optBLAS=TRUE))

I haven't gotten this working yet (keep running out of memory), but you may have better luck. You should also look into linking R to an optimized BLAS, e.g. see here for MacOS.

Someone here reports a parallelized version (code is here, along with some forked versions)

Upvotes: 2

Related Questions