jyson
jyson

Reputation: 285

how to avoid of using nested lapply in R?

I am seeking efficient alternative for nested lapply, I think using nested structure is not appreciated in R community. Can anyone propose possible ideas, or approach to avoid of using nest lapply in custom function?

Here is quick reproducible example:

simulated Data

a <- data.frame(
  start=seq(1, by=9, len=18), stop=seq(6, by=9, len=18),
  ID=letters[seq(1:18)], score=sample(1:25, 18, replace = FALSE))
b <- data.frame(
  start=seq(2, by=11, len=20), stop=seq(8, by=11, len=20),
  ID=letters[seq(1:20)], score=sample(1:25, 20, replace = FALSE))
c <- data.frame(
  start=seq(4, by=11, len=25), stop=seq(9, by=11, len=25),
  ID=letters[seq(1:25)], score=sample(1:25, 25, replace = FALSE))

function that I used nested lapply, but want to avoid this:

a.big <- a[a$score >10,]
a.sml <- a[(a$score > 6 & a$score <= 10),]
a.non <- a[a$score < 6,]

a_new <- list('big'=a.big, 'sml'=a.sml)
tar.list <- list(b,c)

test <- lapply(a_new, function(ele_) {
  re <- lapply(tar.list, function(li) {
    out <- base::setdiff(ele_, li)
    return(out)
  })
})

objective:

avoid of using nested lapply, to find its efficient alternative. I mean to find better representation for its output which must be easy/fast to reproduce, and allow fast/easy downstream computation. Is there any general approach to do this?

How to avoid of using nested lapply in test? Can anyone propose possible ideas to get through this issues ? Thanks

Best regards:

Jeff

Upvotes: 2

Views: 3079

Answers (2)

Robert
Robert

Reputation: 5162

Is that what you want?

outd <- function(ele_, li) base::setdiff(ele_, li)
mapply(outd, a_new, tar.list, SIMPLIFY = FALSE)

> mapply(outd, a_new, tar.list, SIMPLIFY = FALSE)
$big
   start stop ID score
1      1    6  a    12
6     46   51  f    20
8     64   69  h    24
9     73   78  i    13
10    82   87  j    11
12   100  105  l    19
14   118  123  n    16
15   127  132  o    18
16   136  141  p    22
17   145  150  q    23
18   154  159  r    14

$sml
  start stop ID score
2    10   15  b     9
7    55   60  g    10

Edit

In the previous case mapply applies the function to pairs of the lists elements.

If we take the ideia from outer to expand both lists, we get (not sure if will work in other cases):

bY <- rep(tar.list, rep.int(length(a_new), length(tar.list)))
bX <- rep(a_new, times = ceiling(length(bY)/length(a_new)))
mapply(outd, bX, bY, SIMPLIFY = FALSE)

> mapply(outd, bX, bY, SIMPLIFY = FALSE)
$big
   start stop ID score
1      1    6  a    25
2     10   15  b    23
4     28   33  d    14
7     55   60  g    19
9     73   78  i    20
10    82   87  j    21
12   100  105  l    13
13   109  114  m    12
14   118  123  n    22
16   136  141  p    15
17   145  150  q    18

$sml
   start stop ID score
6     46   51  f     9
8     64   69  h     8
18   154  159  r    10

$big
   start stop ID score
1      1    6  a    25
2     10   15  b    23
4     28   33  d    14
7     55   60  g    19
9     73   78  i    20
10    82   87  j    21
12   100  105  l    13
13   109  114  m    12
14   118  123  n    22
16   136  141  p    15
17   145  150  q    18

$sml
   start stop ID score
6     46   51  f     9
8     64   69  h     8
18   154  159  r    10

Upvotes: 1

Roman
Roman

Reputation: 17678

I'm not sure what you really want. But if you like setdiff of all combinations of both lists, then you can use something like this:

# all combinations
a <- expand.grid(seq_along(a_new), seq_along(tar.list))
a
  Var1 Var2
1    1    1
2    2    1
3    1    2
4    2    2
# apply over all combinations setdiff row-vice 
apply(a, 1, function(x, y, z){ setdiff(y[x[1]], z[x[2]])}, a_new, tar.list)[1:2]
[[1]]
[[1]][[1]]
   start stop ID score
2     10   15  b    21
3     19   24  c    12
6     46   51  f    23
9     73   78  i    15
10    82   87  j    19
11    91   96  k    25
13   109  114  m    11
16   136  141  p    17
17   145  150  q    18
18   154  159  r    24


[[2]]
[[2]][[1]]
   start stop ID score
5     37   42  e     9
14   118  123  n     8
15   127  132  o     7

Using double [[]] brakets gives you a cleaner output of only one list.

apply(a, 1, function(x, y, z){ setdiff(y[[x[1]]],z[[x[2]]])}, a_new, tar.list)

[[1]]
   start stop ID score
2     10   15  b    21
3     19   24  c    12
6     46   51  f    23
9     73   78  i    15
10    82   87  j    19
11    91   96  k    25
13   109  114  m    11
16   136  141  p    17
17   145  150  q    18
18   154  159  r    24

[[2]]
   start stop ID score
5     37   42  e     9
14   118  123  n     8
15   127  132  o     7

[[3]]
   start stop ID score
2     10   15  b    21
3     19   24  c    12
6     46   51  f    23
9     73   78  i    15
10    82   87  j    19
11    91   96  k    25
13   109  114  m    11
16   136  141  p    17
17   145  150  q    18
18   154  159  r    24

[[4]]
   start stop ID score
5     37   42  e     9
14   118  123  n     8
15   127  132  o     7

Upvotes: 5

Related Questions