Rolling join without duplicates using data.table

Question

Imagine the following datasets:

dt.units <- setDT(structure(list(Tienda = c(55596L, 55596L, 55596L, 55596L), `Bulk no.` = c(9021L, 
9021L, 9021L, 9021L), Date = structure(c(18824, 18894, 18927, 
18955), class = "Date"), Units = c(77L, 36L, 37L, 8L)), row.names = c(NA, 
-4L), class = c("data.table", "data.frame")))

   Tienda Bulk no.       Date Units
1:  55596     9021 2021-07-16    77
2:  55596     9021 2021-09-24    36
3:  55596     9021 2021-10-27    37
4:  55596     9021 2021-11-24     8

dt.invoices <- setDT(structure(list(TIENDA = c(55596, 55596, 55596), BULTO = c(9021L, 
9021L, 9021L), Date = structure(c(18828, 18894, 18932), class = "Date"), 
    Invoiced = c(77L, 36L, 37L)), row.names = c(NA, -3L), class = c("data.table", 
"data.frame")))

   TIENDA BULTO       Date Invoiced
1:  55596  9021 2021-07-20       77
2:  55596  9021 2021-09-24       35
3:  55596  9021 2021-11-01       37

I am trying to join them in the following way:

setkeyv(dt.invoices, c("TIENDA", "BULTO", "Date"))
setkeyv(dt.units, c("Tienda", "Bulk no.", "Date"))
dt.data <- dt.invoices[dt.units, roll = "nearest"]

The output will look like this:

   TIENDA BULTO       Date Invoiced Units
1:  55596  9021 2021-07-16       77    77
2:  55596  9021 2021-09-24       35    36
3:  55596  9021 2021-10-27       37    37
4:  55596  9021 2021-11-24       37     8

In the last row I don't want the 37, I want 0/NA. Turning my join around to dt.units[dt.invoices, roll = "nearest"] is not a solution either because then it will remove the 4th record alltogether (and keep the wrong dates).

In short my desired output would look like this:

   TIENDA BULTO       Date Invoiced Units
1:  55596  9021 2021-07-16       77    77
2:  55596  9021 2021-09-24       35    36
3:  55596  9021 2021-10-27       37    37
4:  55596  9021 2021-11-24       NA     8

Rolling join without duplicates using data.table

Answers (1)

Related Questions