hannes101
hannes101

Reputation: 2528

Filter data.table on condition per group

I need to incrementally update a SQL database and therefore receive the following data.table as input.

library(data.table)
dt1 <- data.table(Category = letters[1:4]
                  , Max.Date = rep(as.Date("2018-01-01"),4))

I now would like to filter the data.table in R to select all Categories, which do have a later date in my data.table dt2, but of course the filtering should be done on the level of the individual category, so that the dates are only compared per category and not for the whole data.table

dt2 <- data.table(Category = letters[1:8]
                  , Max.Date = rep(as.Date("2019-01-01"),8))

The desired output should select from dt2, where the Max.Date is larger than in dt1, so the desired output should be:

   dt.desired 
   Category   Max.Date
1:        a 2019-01-01
2:        b 2019-01-01
3:        c 2019-01-01
4:        d 2019-01-01

So selecting from dt2, where the date per category is larger than in dt1.

Upvotes: 0

Views: 221

Answers (1)

Wimpel
Wimpel

Reputation: 27792

The below non-equi join should work

dt2[ dt1, 
     .( Category, Max.Date = x.Max.Date ), 
     on = .( Category, Max.Date > Max.Date ) ][]

should work, resulting in:

#    Category   Max.Date
# 1:        a 2019-01-01
# 2:        b 2019-01-01
# 3:        c 2019-01-01
# 4:        d 2019-01-01

Upvotes: 2

Related Questions