Extract elements from different levels of a nested list

Question

I have a nested list of academic authors such as:

> str(content)
List of 3
 $ author-retrieval-response:List of 1
  ..$ :List of 6
  .. ..$ @status       : chr "found"
  .. ..$ @_fa          : chr "true"
  .. ..$ coredata      :List of 3
  .. .. ..$ dc:identifier : chr "AUTHOR_ID:55604964500"
  .. .. ..$ document-count: chr "6"
  .. .. ..$ cited-by-count: chr "13"
  .. ..$ h-index       : chr "3"
  .. ..$ coauthor-count: chr "7"
  .. ..$ preferred-name:List of 2
  .. .. ..$ surname   : chr "García Cruz"
  .. .. ..$ given-name: chr "Gustavo Adolfo"
 $ author-retrieval-response:List of 1
  ..$ :List of 6
  .. ..$ @status       : chr "found"
  .. ..$ @_fa          : chr "true"
  .. ..$ coredata      :List of 3
  .. .. ..$ dc:identifier : chr "AUTHOR_ID:56595713900"
  .. .. ..$ document-count: chr "4"
  .. .. ..$ cited-by-count: chr "21"
  .. ..$ h-index       : chr "3"
  .. ..$ coauthor-count: chr "5"
  .. ..$ preferred-name:List of 2
  .. .. ..$ surname   : chr "Akimov"
  .. .. ..$ given-name: chr "Alexey"
 $ author-retrieval-response:List of 1
  ..$ :List of 6
  .. ..$ @status       : chr "found"
  .. ..$ @_fa          : chr "true"
  .. ..$ coredata      :List of 3
  .. .. ..$ dc:identifier : chr "AUTHOR_ID:12792624600"
  .. .. ..$ document-count: chr "10"
  .. .. ..$ cited-by-count: chr "117"
  .. ..$ h-index       : chr "6"
  .. ..$ coauthor-count: chr "7"
  .. ..$ preferred-name:List of 2
  .. .. ..$ surname   : chr "Alecke"
  .. .. ..$ given-name: chr "Björn"

I am interested in extracting the following values:

dc:identifier, document-count, cited-by-count, h-index, coauthor-count, surname, given-name

And parsing them in a data-frame like structure.

I have two issues: the first one is that I don't get to access to the different levels of my list. Indeed, while content[[3]] return the elements of the third sub-list/author, I have not found a way to access the sublists of the third author, that is:

> content[[3]][[2]]
Error in content[[3]][[2]] : subscript out of bounds

I also imagine that once I can access to it, I can not simply use sapply as the elements I'd like to parse from my list are not at the same levels.

I paste the dput of the first three elements of my list:

structure(list(`author-retrieval-response` = list(structure(list(
    `@status` = "found", `@_fa` = "true", coredata = structure(list(
        `dc:identifier` = "AUTHOR_ID:55604964500", `document-count` = "6", 
        `cited-by-count` = "13"), .Names = c("dc:identifier", 
    "document-count", "cited-by-count")), `h-index` = "3", `coauthor-count` = "7", 
    `preferred-name` = structure(list(surname = "García Cruz", 
        `given-name` = "Gustavo Adolfo"), .Names = c("surname", 
    "given-name"))), .Names = c("@status", "@_fa", "coredata", 
"h-index", "coauthor-count", "preferred-name"))), `author-retrieval-response` = list(
    structure(list(`@status` = "found", `@_fa` = "true", coredata = structure(list(
        `dc:identifier` = "AUTHOR_ID:56595713900", `document-count` = "4", 
        `cited-by-count` = "21"), .Names = c("dc:identifier", 
    "document-count", "cited-by-count")), `h-index` = "3", `coauthor-count` = "5", 
        `preferred-name` = structure(list(surname = "Akimov", 
            `given-name` = "Alexey"), .Names = c("surname", "given-name"
        ))), .Names = c("@status", "@_fa", "coredata", "h-index", 
    "coauthor-count", "preferred-name"))), `author-retrieval-response` = list(
    structure(list(`@status` = "found", `@_fa` = "true", coredata = structure(list(
        `dc:identifier` = "AUTHOR_ID:12792624600", `document-count` = "10", 
        `cited-by-count` = "117"), .Names = c("dc:identifier", 
    "document-count", "cited-by-count")), `h-index` = "6", `coauthor-count` = "7", 
        `preferred-name` = structure(list(surname = "Alecke", 
            `given-name` = "Björn"), .Names = c("surname", "given-name"
        ))), .Names = c("@status", "@_fa", "coredata", "h-index", 
    "coauthor-count", "preferred-name")))), .Names = c("author-retrieval-response", 
"author-retrieval-response", "author-retrieval-response"))

Many thanks for your help!

Parfait · Accepted Answer

Consider an rapply (recursive apply function) to flatten all nested child and grandchild elements inside an lapply that runs across the top three parent elements. Then transpose the result with t() and pass it into a data.frame() constructor call.

flat_list <- lapply(my_list, function(x) data.frame(t(rapply(x, function(x) x[1]))))

final_df <- do.call(rbind, unname(flat_list))

Output

final_df

#   X.status X._fa coredata.dc.identifier coredata.document.count coredata.cited.by.count h.index coauthor.count preferred.name.surname preferred.name.given.name
# 1    found  true  AUTHOR_ID:55604964500                       6                      13       3              7            García Cruz            Gustavo Adolfo
# 2    found  true  AUTHOR_ID:56595713900                       4                      21       3              5                 Akimov                    Alexey
# 3    found  true  AUTHOR_ID:12792624600                      10                     117       6              7                 Alecke                     Björn

Extract elements from different levels of a nested list

Answers (1)

Related Questions