Reputation: 26258
When doing an update-join, where the i
table has multiple rows per key, how can you control which row is returned?
In this example, the update-join returns the last row from dt2
library(data.table)
dt1 <- data.table(id = 1)
dt2 <- data.table(id = 1, letter = letters)
dt1[
dt2
, on = "id"
, letter := i.letter
]
dt1
# id letter
# 1: 1 z
How can I control it to return the 1st, 2nd, nth row, rather than defaulting to the last?
A couple of references similar to this by user @Frank
Upvotes: 11
Views: 313
Reputation: 93938
The most flexible idea I can think of is to only join the part of dt2
which contains the rows you want. So, for the second row:
dt1[
dt2[, .SD[2], by=id]
, on = "id"
, letter := i.letter
]
dt1
# id letter
#1: 1 b
With a hat-tip to @Frank for simplifying the sub-select of dt2
.
Upvotes: 6
Reputation: 66819
How can I control it to return the 1st, 2nd, nth row, rather than defaulting to the last?
Not elegant, but sort-of works:
n = 3L
dt1[, v := dt2[.SD, on=.(id), x.letter[n], by=.EACHI]$V1]
A couple problems:
It doesn't select using GForce, eg as seen here:
> dt2[, letter[3], by=id, verbose=TRUE]
Detected that j uses these columns: letter
Finding groups using forderv ... 0.020sec
Finding group sizes from the positions (can be avoided to save RAM) ... 0.000sec
lapply optimization is on, j unchanged as 'letter[3]'
GForce optimized j to '`g[`(letter, 3)'
Making each group and running j (GForce TRUE) ... 0.000sec
id V1
1: 1 c
If n
is outside of 1:.N
for some joined groups, no warning will be given:
n = 40L
dt1[, v := dt2[.SD, on=.(id), x.letter[n], by=.EACHI]$V1]
Alternately, make a habit of checking that i
in an update join x[i]
is "keyed" by the join columns:
cols = "id"
stopifnot(nrow(dt2) == uniqueN(dt2, by=cols))
And then make a different i
table to join on if appropriate
mDT = dt2[, .(letter = letter[3L]), by=id]
dt1[mDT, on=cols, v := i.letter]
Upvotes: 6