Reputation: 34753
I'm trying to define a new variable in a data.table
through a merge. The wrinkle is that I'd like to access .N
in order to define the new variable by group, so I'd like to use by
as well, but this is causing an error.
MRE:
dt1<-data.table(pd=rep(1:2,each=3),rnk=rep(1:3,2),
var=c(3:1,1:3),key="pd")
dt2<-data.table(pd=c(1,2),chk=c(2,2),key="pd")
dt1[dt2,new:=var[.N]>i.chk,by=pd]
As you can see, I'd like to define new
to be TRUE
whenever the (within-pd
) highest-rnk
value of var
exceeds 2. The code above seems natural enough to me, but this results in an error: object 'i.chk' not found
(suggesting the merge has not been completed, as the name space of dt2
appears unavailable).
I can get around this with a second step:
> dt1[dt2,new:=var>i.chk][,new:=new[.N],by=pd][]
pd rnk var new
1: 1 1 3 FALSE
2: 1 2 2 FALSE
3: 1 3 1 FALSE
4: 2 1 1 TRUE
5: 2 2 2 TRUE
6: 2 3 3 TRUE
However, this slows down my code substantially since I'm using `:=`
to update around 6 such columns.
Is there no way to update by reference by group when merging?
Upvotes: 2
Views: 74
Reputation: 887691
You may use .EACHI
library(data.table)#v1.9.5+
dt1[dt2, new:=var[.N]>i.chk,.EACHI][]
# pd rnk var new
#1: 1 1 3 FALSE
#2: 1 2 2 FALSE
#3: 1 3 1 FALSE
#4: 2 1 1 TRUE
#5: 2 2 2 TRUE
#6: 2 3 3 TRUE
See ?data.table
and this SO question linked therein:
When
i
is adata.table
,DT[i,j,by=.EACHI]
evaluatesj
for the groups inDT
that each row ini
joins to. That is, you can join (ini
) and aggregate (inj
) simultaneously. We call this grouping by eachi
.
Also with the devel version of data.table
, you may not need to set the key, but can use on
.
dt1[dt2, new:=var[.N] > i.chk ,on='pd', by =.EACHI][]
# pd rnk var new
#1: 1 1 3 FALSE
#2: 1 2 2 FALSE
#3: 1 3 1 FALSE
#4: 2 1 1 TRUE
#5: 2 2 2 TRUE
#6: 2 3 3 TRUE
Upvotes: 3