Reputation: 394
Is there a way to combine the join on and subset functionality in data.table? Say I have the following table:
dt = data.table(itemID = c(1,1,2,2),bucketID = c(1,2,2,3),value = 1:4)
I want to set the value to zero for the lowest bucket of each item. My thought was to run:
ends = dt[,.(min = min(bucketID)),itemID]
dt[ends,on="itemID",bucketID==min,value:=0]
i.e. join the tables, find where the two rows are identical, then update the value column. But this doesn't work. I can get the right results with:
ends = dt[,.(min = min(bucketID)),itemID]
dt = dt[ends,on="itemID"][bucketID==min,value:=0][,c(-4)]
However, this seems a bit circuitous. Is there a better way to combine join and where?
Upvotes: 4
Views: 63
Reputation: 26248
Expanding on your join approach, you can join by both the itemID
and the min
values
dt[
ends
, on = c("itemID", bucketID = "min")
, value := 0
]
dt
# itemID bucketID value
# 1: 1 1 0
# 2: 1 2 2
# 3: 2 2 0
# 4: 2 3 4
Upvotes: 5