M. Wood
M. Wood

Reputation: 587

pull all elements with specific name from a nested list

I have some archived Slack data that I am trying to get some of key message properties. I'd done this by stupidly flattening the entire list, getting a data.frame or tibble with lists nested in some cells. As this dataset gets bigger, I want to pick elements out of this list more smartly so that when this cache becomes big it doesn't take forever to create the data.frame or tibble with the elements I want.

Example where I am trying to pull everything named "type" below into a vector or flat list that I can pull in as a dataframe variable. I named the folder and message level for convenience. Anyone have model code that can help?

library(tidyverse)
    
l <- list(folder_1 = list(
  `msg_1-1` = list(type = "message",
               subtype = "channel_join",
               ts = "1585771048.000200",
               user = "UFUNNF8MA",
               text = "<@UFUNNF8MA> has joined the channel"),
  `msg_1-2` = list(type = "message",
                   subtype = "channel_purpose",
                   ts = "1585771049.000300",
                   user = "UNFUNQ8MA",
                   text = "<@UNFUNQ8MA> set the channel purpose: Talk about xyz")),
  folder_2 = list(
    `msg_2-1` = list(type = "message",
                  subtype = "channel_join",
                  ts = "1585771120.000200",
                  user = "UQKUNF8MA",
                  text = "<@UQKUNF8MA> has joined the channel")) 
)

# gets a specific element
print(l[[1]][[1]][["type"]])

# tried to get all elements named "type", but am not at the right list level to do so
print(purrr::map(l, "type"))

Upvotes: 7

Views: 2174

Answers (5)

Simon C.
Simon C.

Reputation: 1067

Alright I wanted a base R solution, and wasn't satisfied with the @Allan Cameron's answer as I wanted something where all matches are grouped together in a final list at the same 'root' level. I didn't want to use unlist to do so, as I want the matched object to be potentially complex table, and don't want to loose there structure. I though that append may do the trick... and after playing a bit with that I think I got something that seemss to work (at list in my and OP's case):

I used Allan names:

get_elements <- function(x, element) {
    newlist=list()
    for(elt in names(x)){
        if(elt == element) newlist=append(newlist,x[elt])
        else if(is.list(x[[elt]])) newlist=append(newlist,get_elements(x[[elt]],element) )
    }
    return(newlist)
}

Less elegant than a lapply (to my taste) but I am not sure I could do what I want with any *apply function... Although I still feel something even simpler and nicer could be done (maybe with do.call?) but can't find it...

Results with OP's list:

> get_elements(l,"user")                                                                                                                                                                                                                   
$user
[1] "UFUNNF8MA"

$user
[1] "UNFUNQ8MA"

$user
[1] "UQKUNF8MA"

> get_elements(l,"type")
$type
[1] "message"

$type
[1] "message"

$type
[1] "message"

Upvotes: 1

M. Wood
M. Wood

Reputation: 587

Related to those provided by @Duck & @Abdessabour Mtk yesterday, purrr has a function map_depth() that will let you get a named attribute if you know its name and how deep it is in the hierarchy. REALLY useful when crawling this big nested lists, and is a simpler solution to the nested map() calls above.

purrr::map_depth(l, 2, "type")

Upvotes: 3

Joris C.
Joris C.

Reputation: 6234

Another option is to use rrapply() in the rrapply-package:

library(rrapply)

## return unlisted vector
rrapply(l, condition = function(x, .xname) .xname == "type", how = "unlist")
#> folder_1.msg_1-1.type folder_1.msg_1-2.type folder_2.msg_2-1.type 
#>             "message"             "message"             "message"

## return melted data.frame
rrapply(l, condition = function(x, .xname) .xname == "type", how = "melt")
#>         L1      L2   L3   value
#> 1 folder_1 msg_1-1 type message
#> 2 folder_1 msg_1-2 type message
#> 3 folder_2 msg_2-1 type message

Upvotes: 2

Allan Cameron
Allan Cameron

Reputation: 173858

Depending on the desired output, I would probably use a simple recursive function here.

get_elements <- function(x, element) {
  if(is.list(x))
  {
    if(element %in% names(x)) x[[element]]
    else lapply(x, get_elements, element = element)
  }
}

This allows:

get_elements(l, "type")
#> $folder_1
#> $folder_1$`msg_1-1`
#> [1] "message"
#> 
#> $folder_1$`msg_1-2`
#> [1] "message"
#> 
#> 
#> $folder_2
#> $folder_2$`msg_2-1`
#> [1] "message"

Or if you want to get all "users":

get_elements(l, "user")
#> $folder_1
#> $folder_1$`msg_1-1`
#> [1] "UFUNNF8MA"
#> 
#> $folder_1$`msg_1-2`
#> [1] "UNFUNQ8MA"
#> 
#> 
#> $folder_2
#> $folder_2$`msg_2-1`
#> [1] "UQKUNF8MA"

You could obviously unlist the result if you prefer it flattened into a vector.

unlist(get_elements(l, "type"))
#> folder_1.msg_1-1 folder_1.msg_1-2 folder_2.msg_2-1 
#>        "message"        "message"        "message" 

Upvotes: 5

Duck
Duck

Reputation: 39595

As OP mentioned, this can solve the issue:

#Code
unlist(l)[grepl('.type',names(unlist(l)),fixed=T)]

Output:

folder_1.msg_1-1.type folder_1.msg_1-2.type folder_2.msg_2-1.type 
            "message"             "message"             "message" 

Another options are (Many thanks and credit to @Abdessabour Mtk)

#Code1
purrr::map(l, ~ purrr::map(.x, "type"))

Upvotes: 2

Related Questions