Reputation: 301
I have a list of list of lists; let's call it mat
. I want to convert it to dataframe.
Here are some sample contents.
[14]][[1000]]
[[14]][[1000]][[1]]
[1] 51
[[14]][[1000]][[2]]
[1] 10
[[14]][[1000]][[3]]
[1] "C Hou" "C Han"
[[14]][[1000]][[4]]
[1] "Communication Middleware and Software for QoS Control in Distributed Real-Time EnvironmentsSpecifically, we consider the following innovative research components "
[[14]][[1000]][[5]]
[1] "COMPSAC International Computer Software and Applications Conference"
They are: paper ID, author ID, coauthor names, paper title, and journal title.
This large list is generated from 14 text files, and I happened to pick the last one printed to the console, thus the "first" index of [[14]]; the "second" index of [[1000]] is referring to 1000th entry or record in the text file, and [[1]] is the "index" of the "column names" (paper ID, author ID, coauthor names, paper title, and journal title).
Now, I have tried a few things on SO to no luck; I always get the error Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 1, 0
when I try to convert it to dataframe.
Moreover, when I use the code x = mat[[1]]
, wanting to extract one list of list, the one from the first text files, I could not even "view" it. View(x)
produces the same error: Error in View : arguments imply differing number of rows: 1, 0
.
I am completely lost as to how to convert this large list into a dataframe that I can use. Thanks.
Upvotes: 1
Views: 2181
Reputation: 1790
I tried to recreate some sample data that matches the structure of your data (I hope I got it right):
## Create sample data:
createList <- function(j){
nElem <- 5
paperIDVec <- sample.int(1000, nElem, replace = FALSE)
authorIDVec <- sample.int(1000, nElem, replace = FALSE)
coauthorsList <- lapply(1:nElem, function(ii){
paste("Coauthor", 1:sample.int(3, 1))
})
paperTitleVec <- paste("Some brilliant idea that author", authorIDVec, "had")
journalVec <- vapply(1:nElem, function(ii) paste("Journal",
paste(LETTERS[sample.int(26, 3, replace = TRUE)], collapse = "")), character(1))
outList <- lapply(1:nElem, function(ii){
list(paperIDVec[ii], authorIDVec[ii],
coauthorsList[[ii]], paperTitleVec[ii],
journalVec[ii])
})
}
mat <- lapply(1:4, createList)
Using this data and following the approach of @chinsoon12 I first pasted the entries together to create a single character for each entry (e.g. a vector of three co-authors c("Mr. X", "Mrs. J", "Mr. M")
becomes "Mr. X, Mrs. J, Mr. M"
), and then turned the data into data frames and successively combined them to create one big data frame:
## Turn nested list into one data frame:
textFileDfList <- lapply(mat, function(listLevel2) {
## Convert list on second level of hierarchy (= one text file)
## to a list of data frames (one for each entry)
dataFrameList <- lapply(listLevel2, function(listLevel3){
## Paste multiple entries (e.g. vector of co-authors)
## together to create a single character entry:
simplifiedList <- lapply(listLevel3,
function(entries) paste(entries, collapse = ", "))
## Create data.frame:
outDf <- as.data.frame(simplifiedList,
stringsAsFactors = FALSE,
col.names = c("paper ID", "author ID", "coauthor names",
"paper title", "journal title"))
})
## Combine data frames of the single entries to one data frame,
## containing all entries of the text file:
textFileDf <- do.call('rbind', dataFrameList)
})
## Combine data frames of the text files to one big data frame:
bigDataFrame <- do.call('rbind', textFileDfList)
> head(bigDataFrame)
paper.ID author.ID coauthor.names
1 862 990 Coauthor 1, Coauthor 2
2 688 400 Coauthor 1
3 921 963 Coauthor 1, Coauthor 2, Coauthor 3
4 479 455 Coauthor 1, Coauthor 2
5 709 340 Coauthor 1
6 936 591 Coauthor 1, Coauthor 2
paper.title journal.title
1 Some brilliant idea that author 990 had Journal PZR
2 Some brilliant idea that author 400 had Journal MQD
3 Some brilliant idea that author 963 had Journal WFW
4 Some brilliant idea that author 455 had Journal TZV
5 Some brilliant idea that author 340 had Journal DCR
6 Some brilliant idea that author 591 had Journal EGW
Upvotes: 1
Reputation: 25225
you can perform nested lapply to process each nested list as follows
papers <- do.call(rbind, lapply(mat, function(txtfile) {
lapply(txtfile, function(entry) {
#to handle multiple coauthors and paste into a single string
l <- lapply(entry, function(eachcol) {
paste(eachcol)
})
df <- as.data.frame(l)
df
})
}))
names(papers) <- c("paper ID", "author ID", "coauthor names", "paper title", "journal title")
I do not have the data to test it so do give me a shout if this still fails.
a related qn: why are you not reading the text files in as data.frames rather than lists?
Upvotes: 1