Tamás F
Tamás F

Reputation: 45

Foreach instead for loop

I have a large database and I wrote a code which executes the same calculations on that in a rolling manner by nesting it in a for loop. My problem is that the code runs pretty long. As I read, this is probably caused by R using a single-threaded method as default. As far as I know, foreach package would make it possible to speed up the execution by considerable time, however, I am unsure how to implement it. Currently, my code looks like this, in every iteration I subset a chunk of the large database and do various stuff with these subsets. At the end of an iteration, I collect the output in a time series. Is it possible to apply foreach in this situation?

(k in seq(1,5284, 21)) {
   fdata <- data[k:(k+251),]
   tdata <- data[(k+252):(k+377),]
}

Thanks!

Upvotes: 1

Views: 56

Answers (1)

Martin C. Arnold
Martin C. Arnold

Reputation: 9668

This is certainly doable using foreach. Depending on your OS you would first have to load a suitable backend (e.g. SNOW on a windows machine) and then set up a cluster.

Example:

library(foreach)
library(doSNOW)

# set number of cores/CPUs to be used
(n_cores <- parallel::detectCores() - 1)

# some example data
dat <- matrix(1:1e3, ncol = 10)

# a set you iterate over
k <- 1:99

# run stuff in parallel
cl <- makeCluster(n_cores)
registerDoSNOW(cl)
  
  result <- foreach(k) %dopar% {

    fdata <- dat[k:(k+1), ]
    # do computationally expensive stuff with `fdata`
    # ... and return something
    cumsum(fdata[1,] + fdata[2,])

  }

stopCluster(cl)

By default result will be a list of the results. There are, however, ways to combine into an array or similar. Look at details on the .combine argument in ?foreach.

Upvotes: 1

Related Questions