iGada
iGada

Reputation: 643

Using foreach instead of nested for loop

My objective is to use foreach instead of the conventional nested for loop to save running time. Let's consider the following case.

a <- c(1,2)
b <- c(3,4)
c <- c(1,2)

myfunc <- function(a, b, c) {
       x1 <- a*rnorm(5,0,1)
       x2 <- b*rnorm(5,0,1)
       x3 <- c*rnorm(5,0,1)
       xxx <- cbind(x1,x2,x3)
       return(as.data.frame(xxx))
}

Using the conventional for loop, I can simulate data for all combinations (a*b*c).

# Using for loop.

df1 <- NULL
for(i in a) {
  for(j in b) {
    for (k in c) {
      df1 <- rbind(df1, myfunc(i,j,k))
    }
  }
}

How can I make this using foreach loop? I tried the following. But I'm not sure whether my code generates the intended data. Thanks!

# Using foreach loop.

library(foreach)
library(doParallel)
cl <- makeCluster(7) 
registerDoParallel(cl)

df2 <- foreach(i = a, .combine = 'rbind') %:%
        foreach(j = b, .combine = 'rbind') %:%
        foreach(k = c, .combine = 'rbind') %dopar% {
        xx <- myfunc(i,j,k)
        return(xx)
      }
df2

Upvotes: 0

Views: 120

Answers (1)

socialscientist
socialscientist

Reputation: 4272

You can skip the nested loops entirely by using CJ() from the data.table package, as shown below. If you alternatively prefer not to use packages, grid.expand() is a much slower base R version. Just call expand.grid(x,y,z).

library(data.table)

# Example vectors
x <- c(1, 2)
y <- c(3, 4)
z <- c(1, 2)

# Make a single object
dt <- CJ(x,y,z)

dt
#>    x y z
#> 1: 1 3 1
#> 2: 1 3 2
#> 3: 1 4 1
#> 4: 1 4 2
#> 5: 2 3 1
#> 6: 2 3 2
#> 7: 2 4 1
#> 8: 2 4 2

This is incredibly fast and uses built-in parallelization, although you can further speed things up by splitting all of the vectors into multiple, equal-length vectors and treating them as grouping factors within which parallel processing can occur. This is basically manual "chunking."

Once you have this object, you could use foreach() to parallelize your code many different ways. For example, you could split up dt into a list of smaller dt by row or by column. With the rows, you would just generate a data.frame per group of rows and bind_rows() or rbind() depending upon your approach. Alternatively, you can operate on each column in parallel and bind them together instead.

Here's an example going row-by-row with foreach() to produce your output. I didn't set up a cluster so it says it is in serial for now:

library(foreach)
set.seed(123)

# For each row in dt, multiply each of the 3 columns by 5 random values from
# N(0,1) and return the 5 rows and 3 columns
output <- foreach(x = iterators::iter(dt, by = "row")) %dopar%
  data.table(
    x = x$x * rnorm(5),
    y = x$y * rnorm(5),
    z = x$z * rnorm(5)
  )
#> Warning: executing %dopar% sequentially: no parallel backend registered

head(rbindlist(output), 10)
#>               x          y          z
#>  1: -0.56047565  5.1451950  1.2240818
#>  2: -0.23017749  1.3827486  0.3598138
#>  3:  1.55870831 -3.7951837  0.4007715
#>  4:  0.07050839 -2.0605586  0.1106827
#>  5:  0.12928774 -1.3369859 -0.5558411
#>  6:  1.78691314 -3.2034711 -3.3733866
#>  7:  0.49785048 -0.6539247  1.6755741
#>  8: -1.96661716 -3.0780133  0.3067462
#>  9:  0.70135590 -2.1866737 -2.2762739
#> 10: -0.47279141 -1.8751178  2.5076298

Upvotes: 1

Related Questions