Reputation: 147
For a given matrix named db.mtx.rnk
I'm calculating column pairwise kendall and spearman correlations and saving the results into a squared matrix. The problem is that input matrix is quite big (~5000x5000) and the number of pairwise combinations are too high which takes long time to perform. One option to reduce time by half would be to only calculate the upper triangle, which I have not implemented it yet, but still would be slow. I would like to parallelize to get results. Any hint?
Current code:
# -- get pairwise column combinations
pairwise.permuts <- t(expand.grid(1:ncol(db.mtx.rnk), 1:ncol(db.mtx.rnk)))
# -- iterate over two stats of interest
for(stat in c("kendall", "spearman")){
# -- kendall tau and spearman
stats.vec <- apply(pairwise.permuts, 2, function(x) cor(db.mtx.rnk[,x[1]], db.mtx.rnk[,x[2]], method = stat))
stats.mtx <- matrix(stats.vec, ncol = ncol(db.mtx.rnk))
colnames(stats.mtx) <- colnames(db.mtx.rnk)
rownames(stats.mtx) <- colnames(db.mtx.rnk)
}
Thanks
Upvotes: 0
Views: 150
Reputation: 10375
There are a lot of different possibilities how to parallelise in R. Some options are parallel
, foreach
and future
. Given your code, the least changes you have to make with the future
based package future.apply
as it provides the function future_apply
. You have to use plan(multiprocess)
to tell future
that it should be calculated in parallel. multiprocess
uses different R sessions or forking depending on your OS. This leads to the code (and already speeds up a toy example on my machine):
library(future.apply)
plan(multiprocess)
for(stat in c("kendall", "spearman")){
# -- kendall tau and spearman
stats.vec <- future_apply(pairwise.permuts, 2, function(x) cor(db.mtx.rnk[,x[1]], db.mtx.rnk[,x[2]], method = stat))
stats.mtx <- matrix(stats.vec, ncol = ncol(db.mtx.rnk))
colnames(stats.mtx) <- colnames(db.mtx.rnk)
rownames(stats.mtx) <- colnames(db.mtx.rnk)
}
Upvotes: 1