Reputation: 13
I'm trying to get the results in a nested loop (4) using a function, the print is correct but I cant get the resuls in a dataframe. I have this:
# Just a placeholder for a function to call on
# each iteration
my_function <- function(...) c(...)
for (t in 1:5)
for (s in 1:5)
for (j in 1:4)
for (i in 1:2)
print(c (t,s,j,i,my_function(j,t,s,i)))
Which get the correct results, but I cant store it, I'd tried:
c=c()
for (h in 1:200)
for (t in 1:5)
for (s in 1:5)
for (j in 1:4)
for (i in 1:2)
c[[h]] = c(t,s,j,i,my_function(j,t,s,i))
But doesn´t work, thanks.
Upvotes: 1
Views: 1020
Reputation: 42544
You were almost there with your original attempt: You only need to advance the index h
independently of the nested loops.
At every iteration of the innermost loop where the function is evaluated, h
is advanced by one to store the result of this iteration in the next available storage location:
c1 <- c()
h <- 0L
for (t in 1:5)
for (s in 1:5)
for (j in 1:4)
for (i in 1:2) {
h <- h + 1L
c1[[h]] = c(t, s, j, i, your_function(j, t, s, i))
}
This will return a list of vectors in c1
. The list contains 200 = 5 * 5 * 4 * 2 entries.
(BTW, OP's own answer creates a matrix of 5 rows and 200 columns.)
R offers functions which get the result in one go without loops. The base idea of all three variants below is to create all combinations of t
, s
, j
, and i
first and then to apply the function on these input data.
Base R
eg <- expand.grid(t = 1:5, s = 1:5, j = 1:4, i = 1:2)
eg[, "f"] <- do.call(mapply, c(your_function, eg))
This will return a data.frame with 5 columns and 200 rows.
tidyverse
library(dplyr)
library(tidyr)
library(purrr)
crossing(t = 1:5, s = 1:5, j = 1:4, i = 1:2) %>%
mutate(f = pmap_dbl(., your_function))
This will return a tibble (enhanced data.frame) with 5 columns and 200 rows.
data.table
library(data.table)
CJ(t = 1:5, s = 1:5, j = 1:4, i = 1:2)[, .(t, s, j, i, f = your_function(j, t, s, i))]
This will return a data.table (enhanced data.frame) with 5 columns and 200 rows.
I was surprised and disappointed to find a high rep SO user like akrun seriously is suggesting to use appending instead of indexing. Growing objects is considered inefficient and bad practice as discussed in Chapter 2 of Patrick Burns' book The R Inferno, for instance.
Of course, appending requires less typing but the quickest answer is not always the best. This can be verified by benchmarking the different approaches for different problem sizes.
The benchmark below compares execution times and memory consumption for
cbind()
(column-wise, as in OP's own answer)rbind()
(row-wise)c()
mapply()
(base R)pmap()
(tidyverse)For benchmarking different problem sizes, the length of the innermost loop over i
is varied.
Finally, we need to define a function which is cheap to evaluate:
fct <- function(j, t, s, i) ((10*j + t)*10 + s)*10 + i
For benchmarking the bench
package is used. Checking the results has been turned off because the results of the different approaches differ in class and shape.
library(bench)
library(ggplot2)
library(dplyr)
library(tidyr)
library(purrr)
library(data.table)
bm <- press(
ni = 2L * 10^(0:2),
{
nt <- 5L
ns <- 5L
nj <- 4L
ntsji <- nt * ns * nj * ni
cat(ntsji, "\n")
mark(
grow_cbind = {
c1 <- c()
for (t in seq(nt))
for (s in seq(ns))
for (j in seq(nj))
for (i in seq(ni)) {
c1 <- cbind(c1, c(t, s, j, i, fct(j, t, s, i)))
}
},
idx_col = {
c1 <- matrix(nrow = 5L, ncol = ntsji)
icol <- 0L
for (t in seq(nt))
for (s in seq(ns))
for (j in seq(nj))
for (i in seq(ni)) {
icol <- icol + 1L
c1[ , icol] <- c(t, s, j, i, fct(j, t, s, i))
}
},
grow_rbind = {
c1 <- c()
for (t in seq(nt))
for (s in seq(ns))
for (j in seq(nj))
for (i in seq(ni)) {
c1 <- rbind(c1, c(t, s, j, i, fct(j, t, s, i)))
}
},
idx_row = {
c1 <- matrix(nrow = ntsji, ncol = 5L)
irow <- 0L
for (t in seq(nt))
for (s in seq(ns))
for (j in seq(nj))
for (i in seq(ni)) {
irow <- irow + 1L
c1[irow, ] <- c(t, s, j, i, fct(j, t, s, i))
}
},
grow_list = {
c1 <- list()
for (t in seq(nt))
for (s in seq(ns))
for (j in seq(nj))
for (i in seq(ni)) {
c1 <- c(c1, list(c(t, s, j, i, fct(j, t, s, i))))
}
},
idx_list = {
c1 <- list(ntsji)
idx <- 0L
for (t in seq(nt))
for (s in seq(ns))
for (j in seq(nj))
for (i in seq(ni)) {
idx <- idx + 1L
c1[[idx]] <- c(t, s, j, i, fct(j, t, s, i))
}
},
nol_mapply = {
eg <- expand.grid(t = seq(nt), s = seq(ns), j = seq(nj), i = seq(ni))
eg[, "f"] <- do.call(mapply, c(fct, eg))
},
nol_pmap = {
crossing(t = seq(nt), s = seq(ns), j = seq(nj), i = seq(ni)) %>%
mutate(f = pmap_dbl(., fct))
},
nol_dt = {
CJ(t = seq(nt), s = seq(ns), j = seq(nj), i = seq(ni))[
, .(t, s, j, i, f = fct(j, t, s, i))]
},
check = FALSE
)}
)
The timings are plotted using a logarithmic time scale:
autoplot(bm)
The chart shows that
data.table
approach is the fastest for larger problem sizes. For the largest problem size, it is more than a magnitude faster than the second fastest and 3 magnitudes (1000 times) faster than growing objects.Memory consumption is shown in column mem_alloc
below:
print(bm, n = Inf)
# A tibble: 27 x 14
expression ni min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time
<bch:expr> <dbl> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm>
1 grow_cbind 2 1.58ms 1.88ms 484. 794.16KB 8.73 222 4 458.41ms
2 idx_col 2 1.44ms 1.76ms 489. 12.12KB 6.58 223 3 455.64ms
3 grow_rbind 2 1.7ms 2ms 445. 794.16KB 10.6 126 3 283.15ms
4 idx_row 2 1.41ms 1.59ms 548. 11.81KB 6.37 258 3 471.22ms
5 grow_list 2 1.34ms 1.59ms 574. 164.59KB 4.17 275 2 479.38ms
6 idx_list 2 1.38ms 1.62ms 561. 29.95KB 6.37 264 3 470.7ms
7 nol_mapply 2 714.4us 821.8us 1108. 15.64KB 8.52 520 4 469.3ms
8 nol_pmap 2 7.25ms 8.7ms 111. 22.77KB 4.28 52 2 467.69ms
9 nol_dt 2 1.58ms 1.99ms 460. 78.91KB 2.03 226 1 491.48ms
10 grow_cbind 20 21.86ms 23.24ms 38.0 76.42MB 69.7 6 11 157.74ms
11 idx_col 20 6.9ms 7.92ms 115. 117.59KB 6.53 53 3 459.29ms
12 grow_rbind 20 24.12ms 25.16ms 39.6 76.42MB 87.1 5 11 126.26ms
13 idx_row 20 6.67ms 7.77ms 118. 117.28KB 6.55 54 3 458.18ms
14 grow_list 20 21.36ms 23.13ms 42.7 15.36MB 7.11 18 3 421.72ms
15 idx_list 20 7.44ms 8.36ms 112. 333.92KB 6.59 51 3 455.35ms
16 nol_mapply 20 4.99ms 5.87ms 167. 142.55KB 9.02 74 4 443.29ms
17 nol_pmap 20 13.28ms 15.1ms 62.0 114.36KB 6.89 27 3 435.4ms
18 nol_dt 20 1.67ms 2.1ms 422. 198.44KB 2.05 206 1 488.67ms
19 grow_cbind 200 2.3s 2.3s 0.434 7.45GB 33.9 1 78 2.3s
20 idx_col 200 66.01ms 74.33ms 13.7 1.14MB 7.85 7 4 509.42ms
21 grow_rbind 200 2.43s 2.43s 0.412 7.45GB 32.1 1 78 2.43s
22 idx_row 200 65.9ms 75.92ms 13.6 1.14MB 7.78 7 4 514.05ms
23 grow_list 200 2.04s 2.04s 0.490 1.49GB 7.35 1 15 2.04s
24 idx_list 200 71.34ms 77.3ms 12.3 3.3MB 7.01 7 4 570.57ms
25 nol_mapply 200 52.63ms 64.19ms 14.8 1.48MB 9.26 8 5 540.07ms
26 nol_pmap 200 74.69ms 82.26ms 11.7 1.01MB 7.79 6 4 513.76ms
27 nol_dt 200 2.32ms 2.92ms 259. 1.36MB 1.99 130 1 501.78ms
# ... with 4 more variables: result <list>, memory <list>, time <list>, gc <list>
The table shows clearly that
Excerpt from
sessioninfo::session_info()
- Session info ------------------------------------------------------------------------- setting value version R version 4.0.0 (2020-04-24) os Windows 10 x64 system x86_64, mingw32 ui RStudio - Packages ----------------------------------------------------------------------------- package * version date lib source bench * 1.1.1 2020-01-13 [1] CRAN (R 4.0.0) data.table * 1.12.9 2020-05-26 [1] local dplyr * 1.0.0 2020-05-29 [1] CRAN (R 4.0.0) ggplot2 * 3.3.1 2020-05-28 [1] CRAN (R 4.0.0) purrr * 0.3.4 2020-04-17 [1] CRAN (R 4.0.0) tidyr * 1.1.0 2020-05-20 [1] CRAN (R 4.0.0)
Upvotes: 1
Reputation: 13
Worked with
c1 <- c()
for (t in 1:t) {
for (s in 1:s) {
for (j in 1:4) {
for (i in 1:2){
c1 <- cbind(c1, c(t,s,j,i,function(j,t,s,i)))
}
}
}
}
@akrun Crack
Upvotes: 0
Reputation: 887158
As we initialize with a NULL vector, we can append instead of indexing
c1 <- c()
for (h in 1:200){
for (t in 1:5) {
for (s in 1:5) {
for (j in 1:4) {
for (i in 1:2){
c1 <- c(c1, c(t,s,j,i,function(j,t,s,i)))
}
}
}
}
}
NOTE: c
is a function name, it is better to initialize with a different object name ('c1')
Upvotes: 1