Reputation: 587
I have some archived Slack data that I am trying to get some of key message properties. I'd done this by stupidly flattening the entire list, getting a data.frame or tibble with lists nested in some cells. As this dataset gets bigger, I want to pick elements out of this list more smartly so that when this cache becomes big it doesn't take forever to create the data.frame or tibble with the elements I want.
Example where I am trying to pull everything named "type" below into a vector or flat list that I can pull in as a dataframe variable. I named the folder and message level for convenience. Anyone have model code that can help?
library(tidyverse)
l <- list(folder_1 = list(
`msg_1-1` = list(type = "message",
subtype = "channel_join",
ts = "1585771048.000200",
user = "UFUNNF8MA",
text = "<@UFUNNF8MA> has joined the channel"),
`msg_1-2` = list(type = "message",
subtype = "channel_purpose",
ts = "1585771049.000300",
user = "UNFUNQ8MA",
text = "<@UNFUNQ8MA> set the channel purpose: Talk about xyz")),
folder_2 = list(
`msg_2-1` = list(type = "message",
subtype = "channel_join",
ts = "1585771120.000200",
user = "UQKUNF8MA",
text = "<@UQKUNF8MA> has joined the channel"))
)
# gets a specific element
print(l[[1]][[1]][["type"]])
# tried to get all elements named "type", but am not at the right list level to do so
print(purrr::map(l, "type"))
Upvotes: 7
Views: 2174
Reputation: 1067
Alright I wanted a base R solution, and wasn't satisfied with the @Allan Cameron's answer as I wanted something where all matches are grouped together in a final list at the same 'root' level. I didn't want to use unlist
to do so, as I want the matched object to be potentially complex table, and don't want to loose there structure. I though that append
may do the trick... and after playing a bit with that I think I got something that seemss to work (at list in my and OP's case):
I used Allan names:
get_elements <- function(x, element) {
newlist=list()
for(elt in names(x)){
if(elt == element) newlist=append(newlist,x[elt])
else if(is.list(x[[elt]])) newlist=append(newlist,get_elements(x[[elt]],element) )
}
return(newlist)
}
Less elegant than a lapply
(to my taste) but I am not sure I could do what I want with any *apply function... Although I still feel something even simpler and nicer could be done (maybe with do.call
?) but can't find it...
Results with OP's list:
> get_elements(l,"user")
$user
[1] "UFUNNF8MA"
$user
[1] "UNFUNQ8MA"
$user
[1] "UQKUNF8MA"
> get_elements(l,"type")
$type
[1] "message"
$type
[1] "message"
$type
[1] "message"
Upvotes: 1
Reputation: 587
Related to those provided by @Duck & @Abdessabour Mtk yesterday, purrr has a function map_depth()
that will let you get a named attribute if you know its name and how deep it is in the hierarchy. REALLY useful when crawling this big nested lists, and is a simpler solution to the nested map()
calls above.
purrr::map_depth(l, 2, "type")
Upvotes: 3
Reputation: 6234
Another option is to use rrapply()
in the rrapply
-package:
library(rrapply)
## return unlisted vector
rrapply(l, condition = function(x, .xname) .xname == "type", how = "unlist")
#> folder_1.msg_1-1.type folder_1.msg_1-2.type folder_2.msg_2-1.type
#> "message" "message" "message"
## return melted data.frame
rrapply(l, condition = function(x, .xname) .xname == "type", how = "melt")
#> L1 L2 L3 value
#> 1 folder_1 msg_1-1 type message
#> 2 folder_1 msg_1-2 type message
#> 3 folder_2 msg_2-1 type message
Upvotes: 2
Reputation: 173858
Depending on the desired output, I would probably use a simple recursive function here.
get_elements <- function(x, element) {
if(is.list(x))
{
if(element %in% names(x)) x[[element]]
else lapply(x, get_elements, element = element)
}
}
This allows:
get_elements(l, "type")
#> $folder_1
#> $folder_1$`msg_1-1`
#> [1] "message"
#>
#> $folder_1$`msg_1-2`
#> [1] "message"
#>
#>
#> $folder_2
#> $folder_2$`msg_2-1`
#> [1] "message"
Or if you want to get all "users":
get_elements(l, "user")
#> $folder_1
#> $folder_1$`msg_1-1`
#> [1] "UFUNNF8MA"
#>
#> $folder_1$`msg_1-2`
#> [1] "UNFUNQ8MA"
#>
#>
#> $folder_2
#> $folder_2$`msg_2-1`
#> [1] "UQKUNF8MA"
You could obviously unlist the result if you prefer it flattened into a vector.
unlist(get_elements(l, "type"))
#> folder_1.msg_1-1 folder_1.msg_1-2 folder_2.msg_2-1
#> "message" "message" "message"
Upvotes: 5
Reputation: 39595
As OP mentioned, this can solve the issue:
#Code
unlist(l)[grepl('.type',names(unlist(l)),fixed=T)]
Output:
folder_1.msg_1-1.type folder_1.msg_1-2.type folder_2.msg_2-1.type
"message" "message" "message"
Another options are (Many thanks and credit to @Abdessabour Mtk)
#Code1
purrr::map(l, ~ purrr::map(.x, "type"))
Upvotes: 2