user680111
user680111

Reputation: 1011

Combine a list of data.tables

Is there a specific method for combining a list of data.tables in R?

I have a list of ~20 data.tables, each with around 1 million rows, and would like to combine them into one data.table with 20 million rows.

I've been doing it with

Reduce('rbind', data.table)

but it takes a while.

Tnx!

Upvotes: 27

Views: 16480

Answers (3)

Alex Brown
Alex Brown

Reputation: 42872

For my money, the plyr package's ldply is the by way to do this. I has the advantage that the name of the list element is added as a new first column, named .id.

In addition, a list of data frames is often the output of tapply, in which case replace the whole shebang with ddply.

Alternatives include do.call("rbind", mylist) or lattice's make.groups (haven't been able to find this one recently though).


Note: I may have misunderstood the question-I read data.frame instead of data.table. These techniques still work, but I'm not sure they result in a data.table all of the time.

Upvotes: 2

Chase
Chase

Reputation: 69151

Using do.call appears to be about 10x faster with this made up example:

library(data.table)

x1 <- data.table(x = runif(1e6), y = runif(1e6))
x2 <- data.table(x = runif(1e6), y = runif(1e6))

#20 data.tables all of length 1e6
yourList <- list(x1,x2,x1,x2,x1,x2,x1,x2,x1,x2,x1,x2,x1,x2,x1,x2,x1,x2,x1,x2)

system.time(out1 <- Reduce("rbind", yourList))
#-----
   user  system elapsed 
   3.37    3.03    6.43 
system.time(out2 <- do.call("rbind", yourList))
#-----
   user  system elapsed 
   0.33    0.36    0.68 
all.equal(out1,out2)
#-----
[1] TRUE

Edit - to incorporate Matt's answer

I did not realize data.table had a specific function for this task. Par for the course, it is quite fast. Here is the relevant timing:

system.time(out3 <- rbindlist(yourList))
#-----
   user  system elapsed 
   0.07    0.03    0.11 

all.equal(out1,out3)
#-----
[1] TRUE

Upvotes: 24

Matt Dowle
Matt Dowle

Reputation: 59602

See ?rbindlist and these related questions (easier to find when you know what to search for!) :

data.table questions and answers containing rbindlist

Upvotes: 26

Related Questions