Reputation: 8308
I have a nested list of lists where each entry contains a list of values.
I wish to sample a single element from each nested list and create a group of elements.
An example of such a list is:
xxx_ <- list(c(13L, 15L, 5L, 6L), c(7L, 20L, 14L, 18L, 1L, 8L, 17L),
c(9L, 11L, 4L, 12L), c(16L, 19L, 10L, 2L, 3L))
I was doing the following, but it feel like there need to be a simpler way of sampling this list of lists.
l_sample <- list()
for(g in 1:10) {
l <- c()
for(i in 1:4) {
l <- c(l,sample(xxx_[[i]], 1))
}
l_sample[[g]] <- l
}
Which gives the following result:
> l_sample
[[1]]
[1] 15 7 12 10
[[2]]
[1] 6 1 4 2
[[3]]
[1] 13 18 4 19
[[4]]
[1] 6 17 4 2
[[5]]
[1] 15 18 4 3
[[6]]
[1] 13 18 9 3
[[7]]
[1] 6 17 12 19
[[8]]
[1] 5 20 9 19
[[9]]
[1] 5 18 9 10
[[10]]
[1] 13 7 9 3
I also wanted to append each sample to data-frame as new row, where each element is in new column, but I couldn't do it.
something like:
> df
g1 g2 g3 g4
1 15 7 12 10
2 6 1 4 2
...
Would appreciate some help.
Upvotes: 1
Views: 524
Reputation: 4841
If you want to sample 10 times from each of the nested lists then you can pass size = 10
with replace = TRUE
like this to sample
using sapply
:
set.seed(1)
sapply(xxx_, sample, 10, TRUE)
#R> [,1] [,2] [,3] [,4]
#R> [1,] 13 14 9 19
#R> [2,] 6 14 4 16
#R> [3,] 5 7 9 2
#R> [4,] 13 1 9 16
#R> [5,] 15 1 9 2
#R> [6,] 13 20 9 10
#R> [7,] 5 8 11 19
#R> [8,] 5 8 9 19
#R> [9,] 15 20 9 2
#R> [10,] 15 17 11 2
Change the 10
to the number of draws you want to make.
This approach also has the advantage that it will keep the names. As an example, say your data looked like:
xxx_ <- list(
g1 = c(13L, 15L, 5L, 6L ), g2 = c(7L , 20L, 14L, 18L, 1L, 8L, 17L),
g3 = c(9L , 11L, 4L, 12L), g4 = c(16L, 19L, 10L, 2L, 3L))
Then you can do the following to get a data.frame like you request:
set.seed(1)
as.data.frame(sapply(xxx_, sample, 10, TRUE))
#R> g1 g2 g3 g4
#R> 1 13 14 9 19
#R> 2 6 14 4 16
#R> 3 5 7 9 2
#R> 4 13 1 9 16
#R> 5 15 1 9 2
#R> 6 13 20 9 10
#R> 7 5 8 11 19
#R> 8 5 8 9 19
#R> 9 15 20 9 2
#R> 10 15 17 11 2
It is much faster than calling replicate
:
bench::mark(
replicate = t(replicate(10, sapply(xxx_, sample, 1))),
sapply = sapply(xxx_, sample, 10, TRUE),
min_time = 1, check = FALSE)
#R> # A tibble: 2 x 13
#R> expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result memory time gc
#R> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list> <list> <list> <list>
#R> 1 replicate 213.3µs 235µs 4071. 100.3KB 45.7 3384 38 831ms <NULL> <Rprofmem[,3] [43 × 3]> <bch:tm [3,422]> <tibble [3,422 × 3]>
#R> 2 sapply 22.4µs 25.2µs 38417. 10.4KB 46.2 9988 12 260ms <NULL> <Rprofmem[,3] [6 × 3]> <bch:tm [10,000]> <tibble [10,000 × 3]>
Upvotes: 1
Reputation: 389065
You could use sapply
to select 1 element from each list and use replicate
to repeat it 10 times.
t(replicate(10, sapply(xxx_, sample, 1)))
# [,1] [,2] [,3] [,4]
# [1,] 15 7 9 10
# [2,] 15 8 9 3
# [3,] 13 14 4 19
# [4,] 5 14 12 10
# [5,] 13 20 9 3
# [6,] 5 18 12 16
# [7,] 5 1 11 2
# [8,] 6 14 11 19
# [9,] 5 8 12 3
#[10,] 5 17 4 2
Upvotes: 2