user3032689
user3032689

Reputation: 667

Duplicated col in datatable join

I do a simply join of 2 datables as follows:

set.seed(1)
DT1 <- data.table(
Idx = rep(1:100),  
x1 = round(rnorm(100,0.75,0.3),2),
x2 = round(rnorm(100,0.75,0.3),2),
x3 = round(rnorm(100,0.75,0.3),2))

DT2 <- data.table(
Idx2 = rep(1:100),
x1 = round(rep(pi,100),2),
targetcol = rep(999,100))

DT2[DT1,on = c(Idx2 = "Idx")]

This works, but there is a column i.x1 in the result, which I do not want. I only want to include the 'targetcol', hence the name. Now the problem is that in another example, I have many of these duplicate columns with the 'i' before them and therefore i would like to delete them or better exclude them during the merge. I know this should be possible with X[Y,.(...)], but I didn't find the right way how to fill the dots in .(...) with all but one one column, i.e. with all but i.x1. So I wonder what is the best way to select multiple columns in data table with the list-syntax as above?

Secondly I tried the newer merge syntax of datatable:

merge(x = DT1, y = DT2[,c("Idx2","targetcol")], by.x = "Idx",by.y = "Idx2", all.x=TRUE)

but it leads to a different column ordering, naming (x1.x and x1.y), and moreover, I read it is slower than the other way.

What is the best method to solve this (also in case there are many more columns and duplicates; this was just to illustrate the issue)?

Upvotes: 0

Views: 44

Answers (1)

manotheshark
manotheshark

Reputation: 4367

Answer moved from comments with slight modification from HubertL code

DT1[DT2[, .(Idx2, targetcol)], on = c(Idx = "Idx2")]

Upvotes: 2

Related Questions