Reputation: 397
I have a list of data.tables that I need to cbind, however, I only need the last X columns.
My data is structured as follows:
DT.1 <- data.table(x=c(1,1), y = c("a","a"), v1 = c(1,2), v2 = c(3,4))
DT.2 <- data.table(x=c(1,1), y = c("a","a"), v3 = c(5,6))
DT.3 <- data.table(x=c(1,1), y = c("a","a"), v4 = c(7,8), v5 = c(9,10), v6 = c(11,12))
DT.list <- list(DT.1, DT.2, DT.3)
>DT.list
[[1]]
x y v1 v2
1: 1 a 1 3
2: 1 a 2 4
[[2]]
x y v3
1: 1 a 5
2: 1 a 6
[[3]]
x y v4 v5 v6
1: 1 a 7 9 11
2: 1 a 8 10 12
Columns x and y are the same for each of the data.tables but the amount of columns differs. The output should not include duplicate x, and y columns. It should look as follows:
x y v1 v2 v3 v4 v5 v6
1: 1 a 1 3 5 7 9 11
2: 1 a 2 4 6 8 10 12
I want to avoid using a loop. I am able to bind the data.tables using do.call("cbind", DT.list)
and then remove the duplicates manually, but is there a way where the duplicates aren't created in the first place? Also, efficiency is important since the lists can be long with large data.tables.
thanks
Upvotes: 5
Views: 5672
Reputation: 66819
Here's another way:
Reduce(
function(x,y){
newcols = setdiff(names(y),names(x))
x[,(newcols)] <- y[, ..newcols]
x
},
DT.list,
init = copy(DT.list[[1]][,c("x","y")])
)
# x y v1 v2 v3 v4 v5 v6
# 1: 1 a 1 3 5 7 9 11
# 2: 1 a 2 4 6 8 10 12
This avoids modifying the list (as @bgoldst's <- NULL
assignment does) or making copies of every element of the list (as, I think, the lapply
approach does). I would probably do the <- NULL
thing in most practical applications, though.
Upvotes: 2
Reputation: 35314
Here's how it could be done in one shot, using lapply()
to remove columns x
and y
from second-and-subsequent data.tables before calling cbind()
:
do.call(cbind,c(DT.list[1],lapply(DT.list[2:length(DT.list)],`[`,j=-c(1,2))));
## x y v1 v2 v3 v4 v5 v6
## 1: 1 a 1 3 5 7 9 11
## 2: 1 a 2 4 6 8 10 12
Another approach is to remove columns x
and y
from second-and-subsequent data.tables before doing a straight cbind()
. I think there's nothing wrong with using a for
loop for this:
for (i in seq_along(DT.list)[-1]) DT.list[[i]][,c('x','y')] <- NULL;
DT.list;
## [[1]]
## x y v1 v2
## 1: 1 a 1 3
## 2: 1 a 2 4
##
## [[2]]
## v3
## 1: 5
## 2: 6
##
## [[3]]
## v4 v5 v6
## 1: 7 9 11
## 2: 8 10 12
##
do.call(cbind,DT.list);
## x y v1 v2 v3 v4 v5 v6
## 1: 1 a 1 3 5 7 9 11
## 2: 1 a 2 4 6 8 10 12
Upvotes: 1
Reputation: 6436
Another option would be to use the [,,
indexing function option inside lapply
on the list of data tables and exclude "unwanted" columns (in your case x
and y
). In this way, duplicates columns are not created.
# your given test data
DT.1 <- data.table(x=c(1,1), y = c("a","a"), v1 = c(1,2), v2 = c(3,4))
DT.2 <- data.table(x=c(1,1), y = c("a","a"), v3 = c(5,6))
DT.3 <- data.table(x=c(1,1), y = c("a","a"), v4 = c(7,8), v5 = c(9,10), v6 = c(11,12))
DT.list <- list(DT.1, DT.2, DT.3)
# cbind a list of subsetted data.tables
exclude.col <- c("x","y")
myDT <- do.call(cbind, lapply(DT.list, `[`,,!exclude.col, with = FALSE))
myDT
## v1 v2 v3 v4 v5 v6
## 1: 1 3 5 7 9 11
## 2: 2 4 6 8 10 12
# join x & y columns for final results
cbind(DT.list[[1]][,.(x,y)], myDT)
## x y v1 v2 v3 v4 v5 v6
## 1: 1 a 1 3 5 7 9 11
## 2: 1 a 2 4 6 8 10 12
lapply
myDT <- do.call(cbind, lapply(DT.list, `[`,,!c("x","y")))
myDT
## v1 v2 v3 v4 v5 v6
## 1: 1 3 5 7 9 11
## 2: 2 4 6 8 10 12
# join x & y columns for final results
cbind(DT.list[[1]][,.(x,y)], myDT)
## x y v1 v2 v3 v4 v5 v6
## 1: 1 a 1 3 5 7 9 11
## 2: 1 a 2 4 6 8 10 12
do.call( cbind, c(list(DT.list[[1]][,.(x,y)]), lapply(DT.list, `[`,,!c("x","y"))) )
# way too many brackets...but I think it works
## x y v1 v2 v3 v4 v5 v6
## 1: 1 a 1 3 5 7 9 11
## 2: 1 a 2 4 6 8 10 12
Upvotes: 0