SymbolixAU
SymbolixAU

Reputation: 26258

Update join with multiple rows

Question

When doing an update-join, where the i table has multiple rows per key, how can you control which row is returned?

Example

In this example, the update-join returns the last row from dt2

library(data.table) 

dt1 <- data.table(id = 1) 
dt2 <- data.table(id = 1, letter = letters) 

dt1[ 
    dt2 
    , on = "id" 
    , letter := i.letter 
    ] 

dt1
#    id letter
# 1:  1      z

How can I control it to return the 1st, 2nd, nth row, rather than defaulting to the last?


References

A couple of references similar to this by user @Frank

Upvotes: 11

Views: 313

Answers (2)

thelatemail
thelatemail

Reputation: 93938

The most flexible idea I can think of is to only join the part of dt2 which contains the rows you want. So, for the second row:

dt1[ 
    dt2[, .SD[2], by=id]
    , on = "id" 
    , letter := i.letter
    ]

dt1
#   id letter
#1:  1      b

With a hat-tip to @Frank for simplifying the sub-select of dt2.

Upvotes: 6

Frank
Frank

Reputation: 66819

How can I control it to return the 1st, 2nd, nth row, rather than defaulting to the last?

Not elegant, but sort-of works:

n = 3L
dt1[, v := dt2[.SD, on=.(id), x.letter[n], by=.EACHI]$V1]

A couple problems:

  1. It doesn't select using GForce, eg as seen here:

    > dt2[, letter[3], by=id, verbose=TRUE]
    Detected that j uses these columns: letter 
    Finding groups using forderv ... 0.020sec 
    Finding group sizes from the positions (can be avoided to save RAM) ... 0.000sec 
    lapply optimization is on, j unchanged as 'letter[3]'
    GForce optimized j to '`g[`(letter, 3)'
    Making each group and running j (GForce TRUE) ... 0.000sec 
       id V1
    1:  1  c
    
  2. If n is outside of 1:.N for some joined groups, no warning will be given:

    n = 40L
    dt1[, v := dt2[.SD, on=.(id), x.letter[n], by=.EACHI]$V1]
    

Alternately, make a habit of checking that i in an update join x[i] is "keyed" by the join columns:

cols = "id"
stopifnot(nrow(dt2) == uniqueN(dt2, by=cols))

And then make a different i table to join on if appropriate

mDT = dt2[, .(letter = letter[3L]), by=id]
dt1[mDT, on=cols, v := i.letter]

Upvotes: 6

Related Questions