Reputation: 85
I have two tables of the type data.table. I want to combine the tables into one table. They look like:
DT1:
1 A B C
2 A B C
3 A B C
DT2:
D E F
D E F
D E F
I want to combine them like:
1 A B C D E F
2 A B C D E F
3 A B C D E F
I don't know how I can do this by merging, because the data tables don't have common column names.
I know this is possible with cbind, but I'm working with a lot of rows, so I prefer a function which is built to work with big data tables.
Could anyone tell me how to do this?
Upvotes: 2
Views: 1590
Reputation: 16971
Another option is to assign DT2
as columns of DT1
:
DT1[, colnames(DT2) := DT2]
It is considerably faster than cbind
. Benchmarking:
library(data.table)
DT1 <- data.table(A = sample(LETTERS, 1e6, 1), B = sample(1e6))
DT2 <- data.table(C = sample(LETTERS, 1e6, 1), D = sample(1e6))
DT <- copy(DT1)
microbenchmark::microbenchmark(cbind = cbind(DT1, DT2),
":=" = DT1[, colnames(DT2) := DT2],
setup = {DT1 <- copy(DT)})
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> cbind 5.8867 6.56670 15.157835 11.15040 13.27365 80.9990 100
#> := 4.6967 5.06075 8.011416 5.30005 5.92710 46.9052 100
Upvotes: 1
Reputation: 27732
building on @s_t answer, here is a benchmark of an update join using data.table
DT1 = data.frame(A = rep('A', 300000), B = rep('B', 300000))
DT2 = data.frame(C = rep('C', 300000), D = rep('D', 300000))
library(data.table)
setDT(DT1)
setDT(DT2)
microbenchmark::microbenchmark(
cbind = {
dt1 <-copy(DT1)
dt2 <-copy(DT2)
result <- cbind(DT1, DT2)
},
update_join = {
dt1 <-copy(DT1)
dt2 <-copy(DT2)
dt1[, id := .I][ dt2[, id := .I], c("C", "D") := .(i.C, i.D), on = .(id)][, id := NULL]
} )
# Unit: milliseconds
# expr min lq mean median uq max neval
# cbind 1.8889 2.68405 9.454567 2.99505 3.62625 226.4432 100
# update_join 23.9186 24.67530 36.957518 25.62405 36.42760 249.3631 100
cbind()
still wins by a landslide...
Upvotes: 2
Reputation: 9485
I've tried something with some slighter bigger table (using the code provided by Hart Radev) and I've microbenchmark
ed them, maybe it could be helpful:
library(dplyr)
library(microbenchmark)
DT1 = data.frame(A = rep('A', 300000), B = rep('B', 300000))
DT2 = data.frame(C = rep('C', 300000), D = rep('D', 300000))
microbenchmark(
bind_cols = {bind_cols(DT1, DT2)},
cbind = {cbind(DT1,DT2)},
# Hart solution
merge = { DT1$rowname = rownames(DT1)
DT2$rowname = rownames(DT2)
DT3 = merge(DT1, DT2, by = 'rowname')}
)
Unit: microseconds
expr min lq mean median uq max neval
bind_cols 72.534 88.9610 1.640497e+02 169.6010 209.4940 348.160 100
cbind 42.241 50.5610 8.019269e+01 61.4405 114.9875 250.455 100
merge 2142101.821 2256677.2310 2.574166e+06 2416274.7380 2732207.2465 5956733.422 100
data.table
is not my cup of tea but I suppose it could be helpful have a solution with it.
Upvotes: 2
Reputation: 360
If you want to try it by merging, just add rownames as a column, and do the merge:
DT1 = data.frame(A = rep('A', 3), B = rep('B', 3))
DT1$rowname = rownames(DT1)
DT2 = data.frame(C = rep('C', 3), D = rep('D', 3))
DT2$rowname = rownames(DT2)
DT3 = merge(DT1, DT2, by = 'rowname')
Upvotes: 1