Reputation: 2783
Imagine the following datasets:
dt.units <- setDT(structure(list(Tienda = c(55596L, 55596L, 55596L, 55596L), `Bulk no.` = c(9021L,
9021L, 9021L, 9021L), Date = structure(c(18824, 18894, 18927,
18955), class = "Date"), Units = c(77L, 36L, 37L, 8L)), row.names = c(NA,
-4L), class = c("data.table", "data.frame")))
Tienda Bulk no. Date Units
1: 55596 9021 2021-07-16 77
2: 55596 9021 2021-09-24 36
3: 55596 9021 2021-10-27 37
4: 55596 9021 2021-11-24 8
dt.invoices <- setDT(structure(list(TIENDA = c(55596, 55596, 55596), BULTO = c(9021L,
9021L, 9021L), Date = structure(c(18828, 18894, 18932), class = "Date"),
Invoiced = c(77L, 36L, 37L)), row.names = c(NA, -3L), class = c("data.table",
"data.frame")))
TIENDA BULTO Date Invoiced
1: 55596 9021 2021-07-20 77
2: 55596 9021 2021-09-24 35
3: 55596 9021 2021-11-01 37
I am trying to join them in the following way:
setkeyv(dt.invoices, c("TIENDA", "BULTO", "Date"))
setkeyv(dt.units, c("Tienda", "Bulk no.", "Date"))
dt.data <- dt.invoices[dt.units, roll = "nearest"]
The output will look like this:
TIENDA BULTO Date Invoiced Units
1: 55596 9021 2021-07-16 77 77
2: 55596 9021 2021-09-24 35 36
3: 55596 9021 2021-10-27 37 37
4: 55596 9021 2021-11-24 37 8
In the last row I don't want the 37, I want 0/NA. Turning my join around to dt.units[dt.invoices, roll = "nearest"]
is not a solution either because then it will remove the 4th record alltogether (and keep the wrong dates).
In short my desired output would look like this:
TIENDA BULTO Date Invoiced Units
1: 55596 9021 2021-07-16 77 77
2: 55596 9021 2021-09-24 35 36
3: 55596 9021 2021-10-27 37 37
4: 55596 9021 2021-11-24 NA 8
Upvotes: 1
Views: 82
Reputation: 27792
# rolling join, also take x.Date to filter out duplicates later in the process
dt.units[, c("Date_2", "Invoiced") := dt.invoices[dt.units, .(x.Date, Invoiced), roll = "nearest"]][]
# Tienda Bulk no. Date Units Date_2 Invoiced
# 1: 55596 9021 2021-07-16 77 2021-07-20 77
# 2: 55596 9021 2021-09-24 36 2021-09-24 36
# 3: 55596 9021 2021-10-27 37 2021-11-01 37
# 4: 55596 9021 2021-11-24 8 2021-11-01 37 !!<-- duplicate Date_2
#set Invoices to NA on duplicated Date_2 rows
dt.units[duplicated(Date_2), Invoiced := NA_integer_]
# remove temp column Date_2
dt.units[, Date_2 := NULL][]
# Tienda Bulk no. Date Units Invoiced
# 1: 55596 9021 2021-07-16 77 77
# 2: 55596 9021 2021-09-24 36 36
# 3: 55596 9021 2021-10-27 37 37
# 4: 55596 9021 2021-11-24 8 NA
update based on comment below
# rolling join, also take x.Date to filter out duplicates later in the process
dt.units[, c("Date_2", "Invoiced") := dt.invoices[dt.units, .(x.Date, Invoiced), roll = "nearest"]][]
# Tienda Bulk no. Date Units Date_2 Invoiced
# 1: 55596 9021 2021-07-16 77 2021-07-20 77
# 2: 55596 9021 2021-09-24 36 2021-09-24 36
# 3: 55596 9021 2021-10-27 37 2021-11-01 37
# 4: 55596 9021 2021-11-24 8 2021-11-01 37
dt.units[ dt.units[, .I[.N > 1 & !Date == min(Date)],
by = .(Tienda, `Bulk no.`, Date_2)]$V1,
Invoiced := NA_integer_][, Date_2 := NULL][]
# Tienda Bulk no. Date Units Invoiced
# 1: 55596 9021 2021-07-16 77 77
# 2: 55596 9021 2021-09-24 36 36
# 3: 55596 9021 2021-10-27 37 37
# 4: 55596 9021 2021-11-24 8 NA
Upvotes: 1