Leni Ohnesorge
Leni Ohnesorge

Reputation: 726

list of lists into data.frame in R efficiently

I have a list of lists (qlist), with lists within qlist of different length (see example - tags list), and I'd like to convert selected elements (tags and question_id, skip creation_date) of it to a data.frame where tags are 1 column with the second column of a corresponding question_id.

qlist <- list()

qlist[[1]] <- list(tags = list( "r", "parallel-processing"), creation_date = "1459613802",
question_id = "36375667")
qlist[[2]] <- list(tags = list( "r"), creation_date = "1459613803", question_id = "36375668")

I've managed to do so with the following code

library(plyr)
df_qst_tags <- ldply(qlist, function(x){   as.data.frame(cbind(tag = unlist(x$tags), question_id = x$question_id)) }, .progress = "win")

and the result is as expected: tags in a first column with a corresponding question_id in the second column.

> df_qst_tags
                  tag question_id
1                   r    36375667
2 parallel-processing    36375667
3                   r    36375668

Unfortunately my qlist is very large and my code is too slow. How to rewrite the solution in a more efficient way?

Upvotes: 1

Views: 912

Answers (1)

Martin Morgan
Martin Morgan

Reputation: 46856

Extract the tags and find their geometry

> tags = lapply(qlist, "[[", "tags")
> lengths(tags)
[1] 2 1

You'll unlist tags to get a vector of individual tags. Now extract the other elements, e.g., question_id, and replicate each by the tags geometry, along the lines of

data.frame(tag=unlist(tags, use.names=FALSE),
           question_id = rep(
               vapply(qlist, "[[", character(1), "question_id"),
               lengths(tags)))

Upvotes: 3

Related Questions