Waht
Waht

Reputation: 301

Large list of lists to dataframe

I have a list of list of lists; let's call it mat. I want to convert it to dataframe.

Here are some sample contents.

[14]][[1000]]
[[14]][[1000]][[1]]
[1] 51

[[14]][[1000]][[2]]
[1] 10

[[14]][[1000]][[3]]
[1] "C Hou" "C Han"

[[14]][[1000]][[4]]
[1] "Communication Middleware and Software for QoS Control in Distributed Real-Time EnvironmentsSpecifically, we consider the following innovative research components "

[[14]][[1000]][[5]]
[1] "COMPSAC International Computer Software and Applications Conference"

They are: paper ID, author ID, coauthor names, paper title, and journal title.

This large list is generated from 14 text files, and I happened to pick the last one printed to the console, thus the "first" index of [[14]]; the "second" index of [[1000]] is referring to 1000th entry or record in the text file, and [[1]] is the "index" of the "column names" (paper ID, author ID, coauthor names, paper title, and journal title).

Now, I have tried a few things on SO to no luck; I always get the error Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 1, 0 when I try to convert it to dataframe.

Moreover, when I use the code x = mat[[1]], wanting to extract one list of list, the one from the first text files, I could not even "view" it. View(x) produces the same error: Error in View : arguments imply differing number of rows: 1, 0.

I am completely lost as to how to convert this large list into a dataframe that I can use. Thanks.

Upvotes: 1

Views: 2181

Answers (2)

ikop
ikop

Reputation: 1790

I tried to recreate some sample data that matches the structure of your data (I hope I got it right):

## Create sample data:
createList <- function(j){
    nElem <- 5
    paperIDVec <- sample.int(1000, nElem, replace = FALSE) 
    authorIDVec <- sample.int(1000, nElem, replace = FALSE) 
    coauthorsList <- lapply(1:nElem, function(ii){
                paste("Coauthor", 1:sample.int(3, 1))           
            })
    paperTitleVec <- paste("Some brilliant idea that author", authorIDVec, "had")
    journalVec <- vapply(1:nElem, function(ii) paste("Journal", 
                        paste(LETTERS[sample.int(26, 3, replace = TRUE)], collapse = "")), character(1))
    outList <- lapply(1:nElem, function(ii){
                list(paperIDVec[ii], authorIDVec[ii],
                        coauthorsList[[ii]], paperTitleVec[ii],
                        journalVec[ii])         
            })
}
mat <- lapply(1:4, createList)

Using this data and following the approach of @chinsoon12 I first pasted the entries together to create a single character for each entry (e.g. a vector of three co-authors c("Mr. X", "Mrs. J", "Mr. M") becomes "Mr. X, Mrs. J, Mr. M"), and then turned the data into data frames and successively combined them to create one big data frame:

## Turn nested list into one data frame:
textFileDfList <- lapply(mat, function(listLevel2) {            
            ## Convert list on second level of hierarchy (= one text file)
            ## to a list of data frames (one for each entry)            
            dataFrameList <- lapply(listLevel2, function(listLevel3){
                        ## Paste multiple entries (e.g. vector of co-authors)
                        ## together to create a single character entry:
                        simplifiedList <- lapply(listLevel3, 
                                function(entries) paste(entries, collapse = ", "))
                        ## Create data.frame:
                        outDf <- as.data.frame(simplifiedList, 
                                stringsAsFactors = FALSE, 
                                col.names =  c("paper ID", "author ID", "coauthor names", 
                                        "paper title", "journal title"))                                                    
                    })

            ## Combine data frames of the single entries to one data frame,
            ## containing all entries of the text file:
            textFileDf <- do.call('rbind', dataFrameList)           
        })
## Combine data frames of the text files to one big data frame:
bigDataFrame <- do.call('rbind', textFileDfList)

> head(bigDataFrame)
  paper.ID author.ID                     coauthor.names
1      862       990             Coauthor 1, Coauthor 2
2      688       400                         Coauthor 1
3      921       963 Coauthor 1, Coauthor 2, Coauthor 3
4      479       455             Coauthor 1, Coauthor 2
5      709       340                         Coauthor 1
6      936       591             Coauthor 1, Coauthor 2
                              paper.title journal.title
1 Some brilliant idea that author 990 had   Journal PZR
2 Some brilliant idea that author 400 had   Journal MQD
3 Some brilliant idea that author 963 had   Journal WFW
4 Some brilliant idea that author 455 had   Journal TZV
5 Some brilliant idea that author 340 had   Journal DCR
6 Some brilliant idea that author 591 had   Journal EGW

Upvotes: 1

chinsoon12
chinsoon12

Reputation: 25225

you can perform nested lapply to process each nested list as follows

papers <- do.call(rbind, lapply(mat, function(txtfile) {
    lapply(txtfile, function(entry) {
        #to handle multiple coauthors and paste into a single string
        l <- lapply(entry, function(eachcol) {
            paste(eachcol)
        })

        df <- as.data.frame(l)
        df
    })
}))
names(papers) <- c("paper ID", "author ID", "coauthor names", "paper title", "journal title")

I do not have the data to test it so do give me a shout if this still fails.

a related qn: why are you not reading the text files in as data.frames rather than lists?

Upvotes: 1

Related Questions