Rich Scriven
Rich Scriven

Reputation: 99361

A more efficient way to parse the elements of a nested list

I'm developing a function that parses a nested list. Unfortunately, because of the nature of the raw data, there's really no way I can think of to get around doing it this way. The final three bits of code in the function scare me a little, but they do get the job done. Here they are:

mkList <- lapply(rec, function(x){
      lapply(regex, function(y) grep(y, x, value = TRUE)) })
rem <- lapply(mkList, function(x){
      lapply(x, function(y) sub("[a-z]+,", "", y)) })
lapply(rem, read.as.csv)

Yes, you're seeing that correctly, it's 5 consecutive calls to lapply. And yes, you guessed it, read.as.csv also calls lapply


To make a small reproducible example, consider the nested list x and the next "double" lapply chunk. The result is exactly what I want, but I'm curious

Is there a better, more efficient way to apply a function to the inner list of a nested list?

The inner list elements are csv vectors of varying string length.

> ( x <- list(list(a = c("a,b,c", "d,e,f"), 
                   b = c("1,2,a,b,c,d", "3,4,e,f,g,h"))) )

# [[1]]
# [[1]]$a
# [1] "a,b,c" "d,e,f"
#
# [[1]]$b
# [1] "1,2,a,b,c,d" "3,4,e,f,g,h"

> lapply(x, function(y){
      lapply(y, function(z) do.call(rbind, strsplit(z, ",")))
  })

# [[1]]
# [[1]]$a
#      [,1] [,2] [,3]
# [1,] "a"  "b"  "c" 
# [2,] "d"  "e"  "f" 
# 
# [[1]]$b
#      [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] "1"  "2"  "a"  "b"  "c"  "d" 
# [2,] "3"  "4"  "e"  "f"  "g"  "h" 

Upvotes: 1

Views: 390

Answers (1)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193667

Among the lesser-known functions in the *apply family is rapply--for "recursive lapply". It seems like you're trying to do:

rapply(x, function(y) do.call(rbind, strsplit(y, ",", TRUE)), how = "replace")
# [[1]]
# [[1]]$a
#      [,1] [,2] [,3]
# [1,] "a"  "b"  "c" 
# [2,] "d"  "e"  "f" 
# 
# [[1]]$b
#      [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] "1"  "2"  "a"  "b"  "c"  "d" 
# [2,] "3"  "4"  "e"  "f"  "g"  "h" 

For this particular example, it's a shade behind your approach, but as you scale the example up, it proves to be more efficient.

Upvotes: 3

Related Questions