Reputation: 667
I do a simply join of 2 datables as follows:
set.seed(1)
DT1 <- data.table(
Idx = rep(1:100),
x1 = round(rnorm(100,0.75,0.3),2),
x2 = round(rnorm(100,0.75,0.3),2),
x3 = round(rnorm(100,0.75,0.3),2))
DT2 <- data.table(
Idx2 = rep(1:100),
x1 = round(rep(pi,100),2),
targetcol = rep(999,100))
DT2[DT1,on = c(Idx2 = "Idx")]
This works, but there is a column i.x1
in the result, which I do not want. I only want to include the 'targetcol', hence the name. Now the problem is that in another example, I have many of these duplicate columns with the 'i' before them and therefore i would like to delete them or better exclude them during the merge. I know this should be possible with X[Y,.(...)]
, but I didn't find the right way how to fill the dots in .(...)
with all but one one column, i.e. with all but i.x1
. So I wonder what is the best way to select multiple columns in data table with the list-syntax as above?
Secondly I tried the newer merge syntax of datatable:
merge(x = DT1, y = DT2[,c("Idx2","targetcol")], by.x = "Idx",by.y = "Idx2", all.x=TRUE)
but it leads to a different column ordering, naming (x1.x
and x1.y
), and moreover, I read it is slower than the other way.
What is the best method to solve this (also in case there are many more columns and duplicates; this was just to illustrate the issue)?
Upvotes: 0
Views: 44
Reputation: 4367
Answer moved from comments with slight modification from HubertL code
DT1[DT2[, .(Idx2, targetcol)], on = c(Idx = "Idx2")]
Upvotes: 2