Daniel Restrepo Masi
Daniel Restrepo Masi

Reputation: 13

Save results in nested loop using a function

I'm trying to get the results in a nested loop (4) using a function, the print is correct but I cant get the resuls in a dataframe. I have this:

# Just a placeholder for a function to call on 
# each iteration
my_function <- function(...) c(...)

for (t in 1:5)
  for (s in 1:5)
    for (j in 1:4)
      for (i in 1:2)
        print(c (t,s,j,i,my_function(j,t,s,i)))

Which get the correct results, but I cant store it, I'd tried:

c=c()
for (h in 1:200)
for (t in 1:5)
  for (s in 1:5)
    for (j in 1:4)
      for (i in 1:2)
        c[[h]] = c(t,s,j,i,my_function(j,t,s,i))

But doesn´t work, thanks.

Upvotes: 1

Views: 1020

Answers (3)

Uwe
Uwe

Reputation: 42544

OP's original attempt

You were almost there with your original attempt: You only need to advance the index h independently of the nested loops.

At every iteration of the innermost loop where the function is evaluated, h is advanced by one to store the result of this iteration in the next available storage location:

c1 <- c()
h <- 0L
for (t in 1:5)
  for (s in 1:5)
    for (j in 1:4)
      for (i in 1:2) {
        h <- h + 1L
        c1[[h]] = c(t, s, j, i, your_function(j, t, s, i))
      }

This will return a list of vectors in c1. The list contains 200 = 5 * 5 * 4 * 2 entries.

(BTW, OP's own answer creates a matrix of 5 rows and 200 columns.)

Alternative: no loops

R offers functions which get the result in one go without loops. The base idea of all three variants below is to create all combinations of t, s, j, and i first and then to apply the function on these input data.

Base R

eg <- expand.grid(t = 1:5, s = 1:5, j = 1:4, i = 1:2)
eg[, "f"] <- do.call(mapply, c(your_function, eg))

This will return a data.frame with 5 columns and 200 rows.

tidyverse

library(dplyr)
library(tidyr)
library(purrr)
crossing(t = 1:5, s = 1:5, j = 1:4, i = 1:2) %>% 
  mutate(f = pmap_dbl(., your_function))

This will return a tibble (enhanced data.frame) with 5 columns and 200 rows.

data.table

library(data.table)
CJ(t = 1:5, s = 1:5, j = 1:4, i = 1:2)[, .(t, s, j, i, f = your_function(j, t, s, i))]

This will return a data.table (enhanced data.frame) with 5 columns and 200 rows.

Benchmarking

I was surprised and disappointed to find a high rep SO user like akrun seriously is suggesting to use appending instead of indexing. Growing objects is considered inefficient and bad practice as discussed in Chapter 2 of Patrick Burns' book The R Inferno, for instance.

Of course, appending requires less typing but the quickest answer is not always the best. This can be verified by benchmarking the different approaches for different problem sizes.

The benchmark below compares execution times and memory consumption for

  • growing objects by
    • appending to a matrix using cbind() (column-wise, as in OP's own answer)
    • appending to a matrix using rbind() (row-wise)
    • appending to a list using c()
  • indexing
    • storing column-wise in a pre-allocated matrix
    • storing row-wise in a pre-allocated matrix
    • storing in a list (as attempted in OP's original approach)
  • no loop using
    • mapply() (base R)
    • pmap() (tidyverse)
    • data.table

For benchmarking different problem sizes, the length of the innermost loop over i is varied.

Finally, we need to define a function which is cheap to evaluate:

fct <- function(j, t, s, i) ((10*j + t)*10 + s)*10 + i

For benchmarking the bench package is used. Checking the results has been turned off because the results of the different approaches differ in class and shape.

library(bench)
library(ggplot2)
library(dplyr)
library(tidyr)
library(purrr)
library(data.table)
bm <- press(
  ni = 2L * 10^(0:2),
  {
    nt <- 5L
    ns <- 5L
    nj <- 4L
    ntsji <- nt * ns * nj * ni
    cat(ntsji, "\n")
    mark(
      grow_cbind = {
        c1 <- c()
        for (t in seq(nt)) 
          for (s in seq(ns)) 
            for (j in seq(nj)) 
              for (i in seq(ni)) {
                c1 <- cbind(c1, c(t, s, j, i, fct(j, t, s, i)))
        }
      },
      idx_col = {
        c1 <- matrix(nrow = 5L, ncol = ntsji)
        icol <- 0L
        for (t in seq(nt)) 
          for (s in seq(ns)) 
            for (j in seq(nj)) 
              for (i in seq(ni)) {
                icol <- icol + 1L
                c1[ , icol] <- c(t, s, j, i, fct(j, t, s, i))
              }
      },
      grow_rbind = {
        c1 <- c()
        for (t in seq(nt)) 
          for (s in seq(ns)) 
            for (j in seq(nj)) 
              for (i in seq(ni)) {
                c1 <- rbind(c1, c(t, s, j, i, fct(j, t, s, i)))
              }
      },
      idx_row = {
        c1 <- matrix(nrow = ntsji, ncol = 5L)
        irow <- 0L
        for (t in seq(nt)) 
          for (s in seq(ns)) 
            for (j in seq(nj)) 
              for (i in seq(ni)) {
                irow <- irow + 1L
                c1[irow, ] <- c(t, s, j, i, fct(j, t, s, i))
              }
      },
      grow_list = {
        c1 <- list()
        for (t in seq(nt)) 
          for (s in seq(ns)) 
            for (j in seq(nj)) 
              for (i in seq(ni)) {
                c1 <- c(c1, list(c(t, s, j, i, fct(j, t, s, i))))
              }
      },
      idx_list = {
        c1 <- list(ntsji)
        idx <- 0L
        for (t in seq(nt)) 
          for (s in seq(ns)) 
            for (j in seq(nj)) 
              for (i in seq(ni)) {
                idx <- idx + 1L
                c1[[idx]] <- c(t, s, j, i, fct(j, t, s, i))
              }
      },
      nol_mapply = {
        eg <- expand.grid(t = seq(nt), s = seq(ns), j = seq(nj), i = seq(ni))
        eg[, "f"] <- do.call(mapply, c(fct, eg))
      },
      nol_pmap = {
        crossing(t = seq(nt), s = seq(ns), j = seq(nj), i = seq(ni)) %>% 
          mutate(f = pmap_dbl(., fct))
      },
      nol_dt = {
        CJ(t = seq(nt), s = seq(ns), j = seq(nj), i = seq(ni))[
          , .(t, s, j, i, f = fct(j, t, s, i))]
      },
      check = FALSE
    )}
)

The timings are plotted using a logarithmic time scale:

autoplot(bm)

enter image description here

The chart shows that

  • growing objects is getting slower with increasing problem size compared to the other approaches. For the largest problem size, it is more than a magnitude slower.
  • the way the output object is created (list, matrix column-wise or matrix row-wise) has not much effect on execution speed.
  • the data.table approach is the fastest for larger problem sizes. For the largest problem size, it is more than a magnitude faster than the second fastest and 3 magnitudes (1000 times) faster than growing objects.

Memory consumption is shown in column mem_alloc below:

print(bm, n = Inf)
# A tibble: 27 x 14
   expression    ni      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time
   <bch:expr> <dbl> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm>
 1 grow_cbind     2   1.58ms   1.88ms   484.     794.16KB     8.73   222     4   458.41ms
 2 idx_col        2   1.44ms   1.76ms   489.      12.12KB     6.58   223     3   455.64ms
 3 grow_rbind     2    1.7ms      2ms   445.     794.16KB    10.6    126     3   283.15ms
 4 idx_row        2   1.41ms   1.59ms   548.      11.81KB     6.37   258     3   471.22ms
 5 grow_list      2   1.34ms   1.59ms   574.     164.59KB     4.17   275     2   479.38ms
 6 idx_list       2   1.38ms   1.62ms   561.      29.95KB     6.37   264     3    470.7ms
 7 nol_mapply     2  714.4us  821.8us  1108.      15.64KB     8.52   520     4    469.3ms
 8 nol_pmap       2   7.25ms    8.7ms   111.      22.77KB     4.28    52     2   467.69ms
 9 nol_dt         2   1.58ms   1.99ms   460.      78.91KB     2.03   226     1   491.48ms
10 grow_cbind    20  21.86ms  23.24ms    38.0     76.42MB    69.7      6    11   157.74ms
11 idx_col       20    6.9ms   7.92ms   115.     117.59KB     6.53    53     3   459.29ms
12 grow_rbind    20  24.12ms  25.16ms    39.6     76.42MB    87.1      5    11   126.26ms
13 idx_row       20   6.67ms   7.77ms   118.     117.28KB     6.55    54     3   458.18ms
14 grow_list     20  21.36ms  23.13ms    42.7     15.36MB     7.11    18     3   421.72ms
15 idx_list      20   7.44ms   8.36ms   112.     333.92KB     6.59    51     3   455.35ms
16 nol_mapply    20   4.99ms   5.87ms   167.     142.55KB     9.02    74     4   443.29ms
17 nol_pmap      20  13.28ms   15.1ms    62.0    114.36KB     6.89    27     3    435.4ms
18 nol_dt        20   1.67ms    2.1ms   422.     198.44KB     2.05   206     1   488.67ms
19 grow_cbind   200     2.3s     2.3s     0.434    7.45GB    33.9      1    78       2.3s
20 idx_col      200  66.01ms  74.33ms    13.7      1.14MB     7.85     7     4   509.42ms
21 grow_rbind   200    2.43s    2.43s     0.412    7.45GB    32.1      1    78      2.43s
22 idx_row      200   65.9ms  75.92ms    13.6      1.14MB     7.78     7     4   514.05ms
23 grow_list    200    2.04s    2.04s     0.490    1.49GB     7.35     1    15      2.04s
24 idx_list     200  71.34ms   77.3ms    12.3       3.3MB     7.01     7     4   570.57ms
25 nol_mapply   200  52.63ms  64.19ms    14.8      1.48MB     9.26     8     5   540.07ms
26 nol_pmap     200  74.69ms  82.26ms    11.7      1.01MB     7.79     6     4   513.76ms
27 nol_dt       200   2.32ms   2.92ms   259.       1.36MB     1.99   130     1   501.78ms
# ... with 4 more variables: result <list>, memory <list>, time <list>, gc <list>

The table shows clearly that

  • growing objects allocates much more memory than the other approaches. For the largest problem size, growing a matrix allocates 7000 times more memory than the indexing or no loop approaches.

Session info

Excerpt from

sessioninfo::session_info()
- Session info -------------------------------------------------------------------------
 setting  value                       
 version  R version 4.0.0 (2020-04-24)
 os       Windows 10 x64              
 system   x86_64, mingw32             
 ui       RStudio                     

- Packages -----------------------------------------------------------------------------
 package     * version date       lib source        
 bench       * 1.1.1   2020-01-13 [1] CRAN (R 4.0.0)
 data.table  * 1.12.9  2020-05-26 [1] local         
 dplyr       * 1.0.0   2020-05-29 [1] CRAN (R 4.0.0)
 ggplot2     * 3.3.1   2020-05-28 [1] CRAN (R 4.0.0)
 purrr       * 0.3.4   2020-04-17 [1] CRAN (R 4.0.0)
 tidyr       * 1.1.0   2020-05-20 [1] CRAN (R 4.0.0)

Upvotes: 1

Daniel Restrepo Masi
Daniel Restrepo Masi

Reputation: 13

Worked with

c1 <- c()
for (t in 1:t) {
  for (s in 1:s) {
    for (j in 1:4) {
      for (i in 1:2){
        c1 <- cbind(c1, c(t,s,j,i,function(j,t,s,i)))
      }
    }
  }
}

@akrun Crack

Upvotes: 0

akrun
akrun

Reputation: 887158

As we initialize with a NULL vector, we can append instead of indexing

c1 <- c()
for (h in 1:200){
   for (t in 1:5) {
      for (s in 1:5) {
        for (j in 1:4) {
           for (i in 1:2){
                 c1 <- c(c1, c(t,s,j,i,function(j,t,s,i)))
           }
          }
         }
         }
       }

NOTE: c is a function name, it is better to initialize with a different object name ('c1')

Upvotes: 1

Related Questions