Reputation: 1285
I have a function that needs to manipulate three data frames, all with different structure:
a
: Original data frame. It is a parameter for my function. I need to remove rows from here, given certain conditions.b
: New data frame created in my function. My function adds all the rows here.c
: Another new data frame created in my function. My function adds all the rows here.In order to try the parallel processing, I sat up a minimal code (following this question and this blog) in which I only generated b
:
# Set up the parallel
registerDoParallel( makeCluster(3L) )
b <- foreach(i = 1:nrow(f), .combine = rbind) %dopar% {
tempB <- do_something_function()
tempB
}
That example works perfectly, but I'm missing two data frames. I found other answers, but I do believe my case is different:
I could change a
to be a data frame of rows that would later be removed, but I need to merge all tempA
with only tempA
... if that makes any sense. In the previous questions I linked, they mix all of the outputs.
Upvotes: 2
Views: 3529
Reputation: 11738
It seems that your problem has nothing to do with parallelism, but rather about combining the results.
An example of solution of how I would do it (which I think is the most efficient way to do it):
library(foreach)
tmp <- foreach(i = seq_len(32)) %do% {
list(iris[i, ], mtcars[i, ], iris[i, ])
}
lapply(purrr::transpose(tmp), function(l) do.call(rbind, l))
Upvotes: 2
Reputation: 1285
I found this solution so far. Instead of removing from a
, I'm creating a data frame that is the rows that will be deleted. I wrote a combine function:
combine <- function(x, ...) {
mapply(rbind, x, ..., SIMPLIFY = FALSE)
}
And my loop is something like this:
# Set up the parallel
registerDoParallel( makeCluster(3L) )
# Loop
output <- foreach(i = 1:nrow(f), .combine = combine, .multicombine = TRUE) %dopar% {
tempA <- get_this_value()
tempB <- do_something_function()
tempC <- get_this_other_frame()
# Return the values
list(tempA, tempB, tempC)
}
Then, I access the data using output[[1]]
and so on. However, for this solution I'll still have to do a setdiff
or anti_join
after the loop, to remove the "undesired" rows from a
.
Upvotes: 0