Reputation: 3291
I have two dataframes (actually data.tables).
set.seed(123)
dt1 <- data.table(P=rep(letters[1:3],c(4,2,3)),X=sample(9))
dt1
P X
1: a 3
2: a 7
3: a 9
4: a 6
5: b 5
6: b 1
7: c 2
8: c 8
9: c 4
and:
dt2 <- data.table(P=rep(letters[1:5],length=10),D=c("X","Y","Z","G","F"))
dt2
P D
1: a X
2: b Y
3: c Z
4: d G
5: e F
6: a X
7: b Y
8: c Z
9: d G
10: e F
Now I want to add a new column to dt1, with column "D" of dt2 where P has the same value in dt1 and dt2. It should look like this:
dt_new
P X D
1: a 3 X
2: a 7 X
3: a 9 X
4: a 6 X
5: b 5 Y
6: b 1 Y
7: c 2 Z
8: c 8 Z
9: c 4 Z
Upvotes: 3
Views: 152
Reputation: 59602
+1 to Arun's answer. To show the update-by-reference way to do this ...
Example data from question again:
set.seed(123)
dt1 = data.table(P=rep(letters[1:3],c(4,2,3)),X=sample(9))
dt2 = data.table(P=rep(letters[1:5],length=10),D=c("X","Y","Z","G","F"))
Removed dups in example data using unique()
as Arun did :
dt2 = unique(dt2)
dt2
P D
1: a X
2: b Y
3: c Z
4: d G
5: e F
Now add D by reference to dt1 with data from dt2. Like a foreign key in SQL. Admittedly this syntax isn't obvious or particularly elegant but it does avoid the copy of dt1
. So it can be significantly faster if dt1
is say 10GB in size.
setkey(dt2, P)
dt1[,D:={ .P=P # allows us to refer to the P from dt1 on next line
dt2[.P,D]$D}] # since P is type character, no need to J() or .()
dt1
P X D
1: a 3 X
2: a 7 X
3: a 9 X
4: a 6 X
5: b 5 Y
6: b 1 Y
7: c 2 Z
8: c 8 Z
9: c 4 Z
Or, keeping the duplicates in dt2:
set.seed(123)
dt1 = data.table(P=rep(letters[1:3],c(4,2,3)),X=sample(9))
dt2 = data.table(P=rep(letters[1:5],length=10),D=c("X","Y","Z","G","F"))
setkey(dt2,P)
dt2
P D
1: a X
2: a X
3: b Y
4: b Y
5: c Z
6: c Z
7: d G
8: d G
9: e F
10: e F
dt1[,D:={ .P=P
dt2[.P,D,mult="first"]}]
dt1
P X D
1: a 3 X
2: a 7 X
3: a 9 X
4: a 6 X
5: b 5 Y
6: b 1 Y
7: c 2 Z
8: c 8 Z
9: c 4 Z
Upvotes: 6
Reputation: 118779
I'd do a data.table join
in this manner:
setkey(dt1, P)
dt1[unique(dt2),nomatch=0]
P X D
1: a 3 X
2: a 7 X
3: a 9 X
4: a 6 X
5: b 5 Y
6: b 1 Y
7: c 2 Z
8: c 8 Z
9: c 4 Z
Upvotes: 7