Mayou
Mayou

Reputation: 8848

Fast way of converting large list to dataframe

I have a huge list (700 elements), each element being a vector of length = 16,000. I am looking for an efficient way of converting the list to a dataframe, in the following fashion (this is just a mock example):

lst <- list(a = c(1,2,3), b = c(4,5,6), c = c(7,8,9))

The end result I am looking for is:

 #  [,1] [,2] [,3]
 #a    1    2    3
 #b    4    5    6
 #c    7    8    9

This is what I have tried, but isn't working as I wish:

library(data.table)
result = rbindlist(Map(as.data.frame, lst))

What can I try? Bear in mind that my real example has huge dimensions, and I would need a rather efficient way of doing this operation.

Upvotes: 7

Views: 30828

Answers (3)

G. Grothendieck
G. Grothendieck

Reputation: 270045

Try this. We assume the components of L all are of the same length, n, and we also assume no row names:

L <- list(a = 1:4, b = 4:1) # test input

n <- length(L[[1]])
DF <- structure(L, row.names = c(NA, -n), class = "data.frame")

Upvotes: 19

Ben Bolker
Ben Bolker

Reputation: 226712

I think

lst <- list(a = c(1,2,3), b = c(4,5,6), c = c(7,8,9))
do.call(rbind,lst)

works. I don't know if there's a sneakier/dangerous/corner-cutting way to do it that's more efficient.

You could also try

m <- matrix(unlist(lst),byrow=TRUE,ncol=length(lst[[1]]))
rownames(m) <- names(lst)
as.data.frame(m)

... maybe it's faster?

You may not be able to do very much about speeding up the as.data.frame step. Looking at as.data.frame.matrix to see what could be stripped to make it as bare-bones as possible, it seems that the crux is probably that the columns have to be copied into their own individual list elements:

for (i in ic) value[[i]] <- as.vector(x[, i])

You could try stripping down as.data.frame.matrix to see if you can speed it up, but I'm guessing that this operation is the bottleneck. In order to get around it you have to find some faster way of mapping your data from a list of rows into a list of columns (perhaps an Rcpp solution??).

The other thing to consider is whether you really need a data frame -- if your data are of a homogeneous type, you could just keep the results as a matrix. Matrix operations on big data are a lot faster anyway ...

Upvotes: 6

Se&#241;or O
Se&#241;or O

Reputation: 17432

How about just t(as.data.frame(List)) ?

> A = 1:16000
> List = list()
> for(i in 1:700) List[[i]] = A
> system.time(t(as.data.frame(List)))
   user  system elapsed 
   0.25    0.00    0.25 

Upvotes: 3

Related Questions