Reputation: 2874
I would like to run a large loop with the foreach
function.This means using the %dopar%
operator.
I can't find any questions already answered to this problem exactly. If this is a duplicate though, please point me in the right direction and I'll close this question.
I have been having mixed success. It works for simple examples on my machine, as per the help documentation, however I cannot seem to get good results for my own work.
My example is slightly more complicated, so the devil seems to be in the detail, as always! I have also read the 'white paper' provided by the package creators Revolution Analytics (you can get it here).
I don't see how best to maybe use the .combine
argument to apply results to my global output list.
I would like to assign claculated value to one big list as opposed to using cbind
or c
My example is pretty convoluted, but if I simplify it any further then any answers might not address my issue.
I will perform a kind of moving-linear model. So fit a model using lm() over 50 obersvations [1:50], predict the 51st observation [51], saving the results to a list.
Then I will shift it all one observation further. So a lm
over [2:51] and predict the 52nd observation [52].
I will use a total of 100 observations, so I can make a maximum of 50 predictions.
## ============================================ ##
## Setup the backend for the foreach function ##
## ============================================ ##
## doMC calls upon cores on demand, uses them and closes them
library(doMC)
registerDoMC(detectCores()) #detectCores() uses all cores
## for Windows users
#library(doParallel) --> for Windows users
#registerDoParallel(detectCores())
## ======================== ##
## Create some dummy data ##
## ======================== ##
## three columns, one hundred observations
my_data <- data.table(outcome = runif(100), V1 = 3*runif(100), V2 = sqrt(runif(100)))
## Have a look at the data if you like - using the DT package
library(DT)
datatable(my_data, options = list(pageLength = nrow(my_data)))
## ================================= ##
## Perform the loop the normal way ##
## ================================= ##
## Create container (a list of lists) for results
my_results <- sapply(c(paste0("step_", seq(1:50))), function(x) NULL)
step_results <- sapply(c("coefs", "rank", "error"), function(x) NULL)
for(i in 1:length(my_results)){my_results[[i]] <- step_results}
## Use a for loop to stpe through all the 50 'slices'
for(i in 1:50) { #max. 50 predictions possible
## Fit a linear model
my_fit <- lm("outcome ~ V1 + V2", data = my_data[i:(i+49)])
## Predict the next step
my_pred <- predict(my_fit, newdata = my_data[i+50, .(V1, V2)])
error <- my_data$outcome[i+50] - my_pred #simply measure the delta to the actual value
## Assign some results to the container created earlier
my_results[[i]][[1]] <- my_fit$coefficients
my_results[[i]][[2]] <- my_fit$rank
my_results[[i]][[3]] <- error
}
str(my_results) ## Keep this container to compare to our next one
## ============================================ ##
## Perform the loop using foreach and %dopar% ##
## ============================================ ##
## Create same results object for results as previously for parallel results
par_results <- sapply(c(paste0("step_", seq(1:50))), function(x) NULL)
step_results <- sapply(c("coefs", "rank", "error"), function(x) NULL)
for(i in 1:length(par_results)){par_results[[i]] <- step_results}
my_results_par <- foreach(i = 1:50) %dopar%
{ #max. 50 predictions possible
my_fit <- lm("outcome ~ V1 + V2", data = my_data[i:(i+49)])
my_pred <- predict(my_fit, newdata = my_data[i+50, .(V1, V2)])
error <- my_data$outcome[i+50] - my_pred
## Assign some results to the container created earlier
par_results[[i]][[1]] <- my_fit$coefficients
par_results[[i]][[2]] <- my_fit$rank
par_results[[i]][[3]] <- error
Sys.sleep(i/20) #Allows time to see R processes spawn on your system
return(par_results)
}
## We can see straight away that this didn't work as I would like it to
identical(my_results, my_results_par) #FALSE
## This shows that the output seems good on the surface
class(my_results_par)
length(my_results_par)
## This shows that it doesn't (WARNING: very long)
str(my_results_par)
You can try out the various .combine
arguments in the foreach
function, for example:
foreach(i = 1:50, .combine = "c") {computation}
or
foreach(i = 1:50, .combine = "cbind") {computation}
these prodice a vector and a matrix respectively, but do not contain all the results that I was trying to save in each loop.
Questions
.combine
argument to create my desired output?I have read that you can supply a custom function to foreach
... might this be the way to do it? I still don't see how I would combine the results.
Upvotes: 0
Views: 2125
Reputation: 3888
Yes, this can easily be done. We can modify your code for the foreach
-step to the following, where we export the data.table
package to each worker.
my_results_par <- foreach(i = 1:50, .combine = append, .packages = c("data.table")) %dopar%
{
my_fit <- lm("outcome ~ V1 + V2", data = my_data[i:(i+49)])
my_pred <- predict(my_fit, newdata = my_data[i+50, .(V1, V2)])
error <- my_data$outcome[i+50] - my_pred
par_results <- list(
coefs = my_fit$coefficients,
rank = my_fit$rank,
error = error
)
par_results <- list(par_results)
names(par_results) <- paste0("step_", i)
return(par_results)
}
identical(my_results, my_results_par)
[1] TRUE
Upvotes: 1