Reputation: 285
I am seeking efficient alternative for nested lapply, I think using nested structure is not appreciated in R community. Can anyone propose possible ideas, or approach to avoid of using nest lapply in custom function?
Here is quick reproducible example:
a <- data.frame(
start=seq(1, by=9, len=18), stop=seq(6, by=9, len=18),
ID=letters[seq(1:18)], score=sample(1:25, 18, replace = FALSE))
b <- data.frame(
start=seq(2, by=11, len=20), stop=seq(8, by=11, len=20),
ID=letters[seq(1:20)], score=sample(1:25, 20, replace = FALSE))
c <- data.frame(
start=seq(4, by=11, len=25), stop=seq(9, by=11, len=25),
ID=letters[seq(1:25)], score=sample(1:25, 25, replace = FALSE))
a.big <- a[a$score >10,]
a.sml <- a[(a$score > 6 & a$score <= 10),]
a.non <- a[a$score < 6,]
a_new <- list('big'=a.big, 'sml'=a.sml)
tar.list <- list(b,c)
test <- lapply(a_new, function(ele_) {
re <- lapply(tar.list, function(li) {
out <- base::setdiff(ele_, li)
return(out)
})
})
avoid of using nested lapply, to find its efficient alternative. I mean to find better representation for its output which must be easy/fast to reproduce, and allow fast/easy downstream computation. Is there any general approach to do this?
How to avoid of using nested lapply in test
? Can anyone propose possible ideas to get through this issues ? Thanks
Best regards:
Jeff
Upvotes: 2
Views: 3079
Reputation: 5162
Is that what you want?
outd <- function(ele_, li) base::setdiff(ele_, li)
mapply(outd, a_new, tar.list, SIMPLIFY = FALSE)
> mapply(outd, a_new, tar.list, SIMPLIFY = FALSE)
$big
start stop ID score
1 1 6 a 12
6 46 51 f 20
8 64 69 h 24
9 73 78 i 13
10 82 87 j 11
12 100 105 l 19
14 118 123 n 16
15 127 132 o 18
16 136 141 p 22
17 145 150 q 23
18 154 159 r 14
$sml
start stop ID score
2 10 15 b 9
7 55 60 g 10
In the previous case mapply
applies the function to pairs of the lists elements.
If we take the ideia from outer
to expand both lists, we get (not sure if will work in other cases):
bY <- rep(tar.list, rep.int(length(a_new), length(tar.list)))
bX <- rep(a_new, times = ceiling(length(bY)/length(a_new)))
mapply(outd, bX, bY, SIMPLIFY = FALSE)
> mapply(outd, bX, bY, SIMPLIFY = FALSE)
$big
start stop ID score
1 1 6 a 25
2 10 15 b 23
4 28 33 d 14
7 55 60 g 19
9 73 78 i 20
10 82 87 j 21
12 100 105 l 13
13 109 114 m 12
14 118 123 n 22
16 136 141 p 15
17 145 150 q 18
$sml
start stop ID score
6 46 51 f 9
8 64 69 h 8
18 154 159 r 10
$big
start stop ID score
1 1 6 a 25
2 10 15 b 23
4 28 33 d 14
7 55 60 g 19
9 73 78 i 20
10 82 87 j 21
12 100 105 l 13
13 109 114 m 12
14 118 123 n 22
16 136 141 p 15
17 145 150 q 18
$sml
start stop ID score
6 46 51 f 9
8 64 69 h 8
18 154 159 r 10
Upvotes: 1
Reputation: 17678
I'm not sure what you really want. But if you like setdiff
of all combinations of both lists, then you can use something like this:
# all combinations
a <- expand.grid(seq_along(a_new), seq_along(tar.list))
a
Var1 Var2
1 1 1
2 2 1
3 1 2
4 2 2
# apply over all combinations setdiff row-vice
apply(a, 1, function(x, y, z){ setdiff(y[x[1]], z[x[2]])}, a_new, tar.list)[1:2]
[[1]]
[[1]][[1]]
start stop ID score
2 10 15 b 21
3 19 24 c 12
6 46 51 f 23
9 73 78 i 15
10 82 87 j 19
11 91 96 k 25
13 109 114 m 11
16 136 141 p 17
17 145 150 q 18
18 154 159 r 24
[[2]]
[[2]][[1]]
start stop ID score
5 37 42 e 9
14 118 123 n 8
15 127 132 o 7
Using double [[]]
brakets gives you a cleaner output of only one list.
apply(a, 1, function(x, y, z){ setdiff(y[[x[1]]],z[[x[2]]])}, a_new, tar.list)
[[1]]
start stop ID score
2 10 15 b 21
3 19 24 c 12
6 46 51 f 23
9 73 78 i 15
10 82 87 j 19
11 91 96 k 25
13 109 114 m 11
16 136 141 p 17
17 145 150 q 18
18 154 159 r 24
[[2]]
start stop ID score
5 37 42 e 9
14 118 123 n 8
15 127 132 o 7
[[3]]
start stop ID score
2 10 15 b 21
3 19 24 c 12
6 46 51 f 23
9 73 78 i 15
10 82 87 j 19
11 91 96 k 25
13 109 114 m 11
16 136 141 p 17
17 145 150 q 18
18 154 159 r 24
[[4]]
start stop ID score
5 37 42 e 9
14 118 123 n 8
15 127 132 o 7
Upvotes: 5