Reputation: 2574
I've noticed that cbind takes considerably longer than rbind for data.tables. What is the reason for this?
> dt <- as.data.table(mtcars)
> new.dt <- copy(dt)
> timeit({for (i in 1:100) dt.new <- rbind(dt.new, dt)})
user system elapsed
0.237 0.012 0.253
> new.dt <- copy(dt)
> timeit({for (i in 1:100) dt.new <- cbind(dt.new, dt)})
user system elapsed
14.795 0.090 14.912
Where
timeit <- function(expr)
{
ptm <- proc.time()
expr
proc.time() - ptm
}
Upvotes: 10
Views: 13365
Reputation: 49448
Ultimately I think this comes down to alloc.col
being slow due to a loop where it removes various attributes from the columns. I'm not entirely sure why that's done, perhaps Arun or Matt can explain.
As you can see below, the basic operations for cbind
are much faster than rbind
:
cbind.dt.simple = function(...) {
x = c(...)
setattr(x, "class", c("data.table", "data.frame"))
ans = .Call(data.table:::Calloccolwrapper, x, max(100L, ncol(x) + 64L), FALSE)
.Call(data.table:::Csetnamed, ans, 0L)
}
library(microbenchmark)
microbenchmark(rbind(dt, dt), cbind(dt, dt), cbind.dt.simple(dt, dt))
#Unit: microseconds
# expr min lq mean median uq max neval
# rbind(dt, dt) 785.318 996.5045 1665.1762 1234.4045 1520.3830 21327.426 100
# cbind(dt, dt) 2350.275 3022.5685 3885.0014 3533.7595 4093.1975 21606.895 100
# cbind.dt.simple(dt, dt) 74.125 116.5290 168.5101 141.9055 180.3035 1903.526 100
Upvotes: 9