David
David

Reputation: 8308

sample 1 element from each list in nested list in R to create groups of samples

I have a nested list of lists where each entry contains a list of values.

I wish to sample a single element from each nested list and create a group of elements.

An example of such a list is:

xxx_ <- list(c(13L, 15L, 5L, 6L), c(7L, 20L, 14L, 18L, 1L, 8L, 17L), 
    c(9L, 11L, 4L, 12L), c(16L, 19L, 10L, 2L, 3L))

I was doing the following, but it feel like there need to be a simpler way of sampling this list of lists.

l_sample <- list()
for(g in 1:10) {
  l <- c()
  for(i in 1:4) {
    l <- c(l,sample(xxx_[[i]], 1))
  }
  l_sample[[g]] <- l
}

Which gives the following result:

> l_sample
[[1]]
[1] 15  7 12 10

[[2]]
[1] 6 1 4 2

[[3]]
[1] 13 18  4 19

[[4]]
[1]  6 17  4  2

[[5]]
[1] 15 18  4  3

[[6]]
[1] 13 18  9  3

[[7]]
[1]  6 17 12 19

[[8]]
[1]  5 20  9 19

[[9]]
[1]  5 18  9 10

[[10]]
[1] 13  7  9  3

I also wanted to append each sample to data-frame as new row, where each element is in new column, but I couldn't do it.

something like:

> df
  g1 g2 g3 g4
1 15  7 12 10
2 6 1 4 2
...

Would appreciate some help.

Upvotes: 1

Views: 524

Answers (2)

Benjamin Christoffersen
Benjamin Christoffersen

Reputation: 4841

If you want to sample 10 times from each of the nested lists then you can pass size = 10 with replace = TRUE like this to sample using sapply:

set.seed(1)
sapply(xxx_, sample, 10, TRUE)
#R>       [,1] [,2] [,3] [,4]
#R>  [1,]   13   14    9   19
#R>  [2,]    6   14    4   16
#R>  [3,]    5    7    9    2
#R>  [4,]   13    1    9   16
#R>  [5,]   15    1    9    2
#R>  [6,]   13   20    9   10
#R>  [7,]    5    8   11   19
#R>  [8,]    5    8    9   19
#R>  [9,]   15   20    9    2
#R> [10,]   15   17   11    2

Change the 10 to the number of draws you want to make.

This approach also has the advantage that it will keep the names. As an example, say your data looked like:

xxx_ <- list(
  g1 = c(13L, 15L, 5L, 6L ), g2 = c(7L , 20L, 14L, 18L, 1L, 8L, 17L), 
  g3 = c(9L , 11L, 4L, 12L), g4 = c(16L, 19L, 10L, 2L, 3L))

Then you can do the following to get a data.frame like you request:

set.seed(1)
as.data.frame(sapply(xxx_, sample, 10, TRUE))
#R>    g1 g2 g3 g4
#R> 1  13 14  9 19
#R> 2   6 14  4 16
#R> 3   5  7  9  2
#R> 4  13  1  9 16
#R> 5  15  1  9  2
#R> 6  13 20  9 10
#R> 7   5  8 11 19
#R> 8   5  8  9 19
#R> 9  15 20  9  2
#R> 10 15 17 11  2

Speed

It is much faster than calling replicate:

bench::mark(
    replicate = t(replicate(10, sapply(xxx_, sample, 1))), 
    sapply    = sapply(xxx_, sample, 10, TRUE), 
    min_time = 1, check = FALSE)
#R> # A tibble: 2 x 13
#R>   expression      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result memory                  time              gc                   
#R>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list> <list>                  <list>            <list>               
#R> 1 replicate   213.3µs    235µs     4071.   100.3KB     45.7  3384    38      831ms <NULL> <Rprofmem[,3] [43 × 3]> <bch:tm [3,422]>  <tibble [3,422 × 3]> 
#R> 2 sapply       22.4µs   25.2µs    38417.    10.4KB     46.2  9988    12      260ms <NULL> <Rprofmem[,3] [6 × 3]>  <bch:tm [10,000]> <tibble [10,000 × 3]>

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 389065

You could use sapply to select 1 element from each list and use replicate to repeat it 10 times.

t(replicate(10, sapply(xxx_, sample, 1)))

#      [,1] [,2] [,3] [,4]
# [1,]   15    7    9   10
# [2,]   15    8    9    3
# [3,]   13   14    4   19
# [4,]    5   14   12   10
# [5,]   13   20    9    3
# [6,]    5   18   12   16
# [7,]    5    1   11    2
# [8,]    6   14   11   19
# [9,]    5    8   12    3
#[10,]    5   17    4    2

Upvotes: 2

Related Questions