Reputation: 141
I have a nested list of academic authors such as:
> str(content)
List of 3
$ author-retrieval-response:List of 1
..$ :List of 6
.. ..$ @status : chr "found"
.. ..$ @_fa : chr "true"
.. ..$ coredata :List of 3
.. .. ..$ dc:identifier : chr "AUTHOR_ID:55604964500"
.. .. ..$ document-count: chr "6"
.. .. ..$ cited-by-count: chr "13"
.. ..$ h-index : chr "3"
.. ..$ coauthor-count: chr "7"
.. ..$ preferred-name:List of 2
.. .. ..$ surname : chr "García Cruz"
.. .. ..$ given-name: chr "Gustavo Adolfo"
$ author-retrieval-response:List of 1
..$ :List of 6
.. ..$ @status : chr "found"
.. ..$ @_fa : chr "true"
.. ..$ coredata :List of 3
.. .. ..$ dc:identifier : chr "AUTHOR_ID:56595713900"
.. .. ..$ document-count: chr "4"
.. .. ..$ cited-by-count: chr "21"
.. ..$ h-index : chr "3"
.. ..$ coauthor-count: chr "5"
.. ..$ preferred-name:List of 2
.. .. ..$ surname : chr "Akimov"
.. .. ..$ given-name: chr "Alexey"
$ author-retrieval-response:List of 1
..$ :List of 6
.. ..$ @status : chr "found"
.. ..$ @_fa : chr "true"
.. ..$ coredata :List of 3
.. .. ..$ dc:identifier : chr "AUTHOR_ID:12792624600"
.. .. ..$ document-count: chr "10"
.. .. ..$ cited-by-count: chr "117"
.. ..$ h-index : chr "6"
.. ..$ coauthor-count: chr "7"
.. ..$ preferred-name:List of 2
.. .. ..$ surname : chr "Alecke"
.. .. ..$ given-name: chr "Björn"
I am interested in extracting the following values:
dc:identifier, document-count, cited-by-count, h-index, coauthor-count, surname, given-name
And parsing them in a data-frame like structure.
I have two issues: the first one is that I don't get to access to the different levels of my list. Indeed, while content[[3]]
return the elements of the third sub-list/author, I have not found a way to access the sublists of the third author, that is:
> content[[3]][[2]]
Error in content[[3]][[2]] : subscript out of bounds
I also imagine that once I can access to it, I can not simply use sapply
as the elements I'd like to parse from my list are not at the same levels.
I paste the dput
of the first three elements of my list:
structure(list(`author-retrieval-response` = list(structure(list(
`@status` = "found", `@_fa` = "true", coredata = structure(list(
`dc:identifier` = "AUTHOR_ID:55604964500", `document-count` = "6",
`cited-by-count` = "13"), .Names = c("dc:identifier",
"document-count", "cited-by-count")), `h-index` = "3", `coauthor-count` = "7",
`preferred-name` = structure(list(surname = "García Cruz",
`given-name` = "Gustavo Adolfo"), .Names = c("surname",
"given-name"))), .Names = c("@status", "@_fa", "coredata",
"h-index", "coauthor-count", "preferred-name"))), `author-retrieval-response` = list(
structure(list(`@status` = "found", `@_fa` = "true", coredata = structure(list(
`dc:identifier` = "AUTHOR_ID:56595713900", `document-count` = "4",
`cited-by-count` = "21"), .Names = c("dc:identifier",
"document-count", "cited-by-count")), `h-index` = "3", `coauthor-count` = "5",
`preferred-name` = structure(list(surname = "Akimov",
`given-name` = "Alexey"), .Names = c("surname", "given-name"
))), .Names = c("@status", "@_fa", "coredata", "h-index",
"coauthor-count", "preferred-name"))), `author-retrieval-response` = list(
structure(list(`@status` = "found", `@_fa` = "true", coredata = structure(list(
`dc:identifier` = "AUTHOR_ID:12792624600", `document-count` = "10",
`cited-by-count` = "117"), .Names = c("dc:identifier",
"document-count", "cited-by-count")), `h-index` = "6", `coauthor-count` = "7",
`preferred-name` = structure(list(surname = "Alecke",
`given-name` = "Björn"), .Names = c("surname", "given-name"
))), .Names = c("@status", "@_fa", "coredata", "h-index",
"coauthor-count", "preferred-name")))), .Names = c("author-retrieval-response",
"author-retrieval-response", "author-retrieval-response"))
Many thanks for your help!
Upvotes: 2
Views: 722
Reputation: 107587
Consider an rapply
(recursive apply function) to flatten all nested child and grandchild elements inside an lapply
that runs across the top three parent elements. Then transpose the result with t()
and pass it into a data.frame()
constructor call.
flat_list <- lapply(my_list, function(x) data.frame(t(rapply(x, function(x) x[1]))))
final_df <- do.call(rbind, unname(flat_list))
Output
final_df
# X.status X._fa coredata.dc.identifier coredata.document.count coredata.cited.by.count h.index coauthor.count preferred.name.surname preferred.name.given.name
# 1 found true AUTHOR_ID:55604964500 6 13 3 7 García Cruz Gustavo Adolfo
# 2 found true AUTHOR_ID:56595713900 4 21 3 5 Akimov Alexey
# 3 found true AUTHOR_ID:12792624600 10 117 6 7 Alecke Björn
Upvotes: 3