Reputation: 790
Which R package is the best for parallel computing?
I am interested in recommendations focused on processing time/vectorisation/user friendly syntax (looking for information valid in 01.2022 (including novelties)).
And here is the second part of the question:
I'd like to iterate through many (thousands) of vectors in a large list, so I'd like to speed it up.
Here is an example of what I can do. Inside the for loop I put some simple rescaling anonymous function just to provide the reprex. IRL I have more complicated computation to do. Can something lile this be speed up? How?
```{r}
library(foreach)
library(doParallel)
#this is dummy list of vecs:
set.seed(123)
v1 <- sample(600:800, 108, replace=TRUE)
v2 <- sample(600:800, 120, replace=TRUE)
v3 <- sample(550:800, 200, replace=TRUE)
v4 <- sample(640:800, 120, replace=TRUE)
v5 <- sample(700:810, 131, replace=TRUE)
v6 <- sample(600:800, 220, replace=TRUE)
v7 <- sample(600:850, 149, replace=TRUE)
v8 <- sample(530:800, 144, replace=TRUE)
v9 <- sample(600:810, 129, replace=TRUE)
v10 <- sample(600:860, 170, replace=TRUE)
list1 <- list()
list1[["first"]] <- v1
list1[["named"]] <- v2
list1[["vector"]] <- v3
list1[["out"]] <- v4
list1[["of"]] <- v5
list1[["many"]] <- v6
list1[["within"]] <- v7
list1[["this"]] <- v8
list1[["dummy"]] <- v9
list1[["list"]] <- v10
#this function rescales vectors within a given list to 0-255 range
parallelism_test <- function(list){
# split list into chunks
index_list = split(1:length(names(list1)), ceiling(1:length(names(list1))/100))
#create empty list for extracted data
newlist = list()
# loop for parallel extraction
for (indx in 1:length(index_list)){
# use selected number of cores
doParallel::registerDoParallel(cores = 5)
# work in parallel (rescale vector to 0-255 range)
newlist <- c(newlist, foreach::foreach(nam = names(list1)[index_list[[indx]]]) %dopar% (function (x) {(x - min(x))* (1/(max(x) - min(x))*255)})(list[[nam]]))}
return(newlist)
}
test<- parallelism_test(list1)
print(test)
```
I will appreciate any advice.
Upvotes: 0
Views: 949
Reputation: 8105
There are quite a number of packages for parallel computing, but I often prefer the builtin parallel
package.
In this case to code is quite clean
vscale <- function(x) {
(x - min(x))/(max(x) - min(x))*255
}
library(parallel)
cl <- makeCluster(4)
list2 <- parLapply(cl, list1, vscale)
Although in this specific, I suspect that, storing your data in, for example, a data.table
and use that, is in many cases faster. data.table
will also use multiple threads if necessary. In the example below, I first create the data.table
from the original list. It would, of course, be better in this case to make sure the data ends up in a data.table
from the beginning.
library(data.table)
dta <- lapply(names(list1),
function(col) data.table(group = col, value = list1[[col]]))
dta <- rbindlist(dta)
dta[, value := vscale(value)]
Upvotes: 1