Merge and assign by reference, `by` ID

Question

I'm trying to define a new variable in a data.table through a merge. The wrinkle is that I'd like to access .N in order to define the new variable by group, so I'd like to use by as well, but this is causing an error.

MRE:

dt1<-data.table(pd=rep(1:2,each=3),rnk=rep(1:3,2),
                var=c(3:1,1:3),key="pd")
dt2<-data.table(pd=c(1,2),chk=c(2,2),key="pd")

dt1[dt2,new:=var[.N]>i.chk,by=pd]

As you can see, I'd like to define new to be TRUE whenever the (within-pd) highest-rnk value of var exceeds 2. The code above seems natural enough to me, but this results in an error: object 'i.chk' not found (suggesting the merge has not been completed, as the name space of dt2 appears unavailable).

I can get around this with a second step:

> dt1[dt2,new:=var>i.chk][,new:=new[.N],by=pd][]
   pd rnk var   new
1:  1   1   3 FALSE
2:  1   2   2 FALSE
3:  1   3   1 FALSE
4:  2   1   1  TRUE
5:  2   2   2  TRUE
6:  2   3   3  TRUE

However, this slows down my code substantially since I'm using `:=` to update around 6 such columns.

Is there no way to update by reference by group when merging?

akrun · Accepted Answer

You may use .EACHI

library(data.table)#v1.9.5+ 
dt1[dt2, new:=var[.N]>i.chk,.EACHI][]
#   pd rnk var   new
#1:  1   1   3 FALSE
#2:  1   2   2 FALSE
#3:  1   3   1 FALSE
#4:  2   1   1  TRUE
#5:  2   2   2  TRUE
#6:  2   3   3  TRUE

See ?data.table and this SO question linked therein:

When i is a data.table, DT[i,j,by=.EACHI] evaluates j for the groups in DT that each row in i joins to. That is, you can join (in i) and aggregate (in j) simultaneously. We call this grouping by each i.

Also with the devel version of data.table, you may not need to set the key, but can use on.

 dt1[dt2, new:=var[.N] > i.chk ,on='pd', by =.EACHI][]
 #  pd rnk var   new
 #1:  1   1   3 FALSE
 #2:  1   2   2 FALSE
 #3:  1   3   1 FALSE
 #4:  2   1   1  TRUE
 #5:  2   2   2  TRUE
 #6:  2   3   3  TRUE

Merge and assign by reference, `by` ID

Answers (1)

Related Questions