Felix Zhao
Felix Zhao

Reputation: 489

Is there any way to join two data frames by date ranges?

I have two data frames, the first dataset is the record for forecasted demand in the following 27 days for each item of the company, shown as below:

library(tidyverse)
library(lubridate)

daily_forecast <- data.frame(
  item=c("A","B","A","B"),
  date_fcsted=c("2020-8-1","2020-8-1","2020-8-15","2020-8-15"),
  fcsted_qty=c(100,200,200,100)
) %>% 
  mutate(date_fcsted=ymd(date_fcsted)) %>% 
  mutate(extended_date=date_fcsted+days(27))

and the other dateset is the actual daily demand for each item:

actual_orders <- data.frame(
  order_date=rep(seq(ymd("2020-8-3"),ymd("2020-9-15"),by = "1 week"),2),
  item=rep(c("A","B"),7),
  order_qty=round(rnorm(n=14,mean=50,sd=10),0)
)

What i am trying to accomplish is to get the actual total demand for each item within the date_fcsted and extended_date in the first dataset and then have them joined to calculate the forecast accuracy.

Solutions with tidyverse would be highly appreciated.

Upvotes: 1

Views: 182

Answers (2)

Ben
Ben

Reputation: 30474

You could also try fuzzy_join as suggested by @Gregor Thomas. I added a row number column to make sure you have unique rows independent of item and date ranges (but this may not be needed).

library(fuzzyjoin)
library(dplyr)

daily_forecast %>%
  mutate(rn = row_number()) %>%
  fuzzy_left_join(actual_orders,
                  by = c("item" = "item",
                         "date_fcsted" = "order_date",
                         "extended_date" = "order_date"),
                  match_fun = list(`==`, `<=`, `>=`)) %>%
  group_by(rn, item.x, date_fcsted, extended_date, fcsted_qty) %>%
  summarise(actual_total_demand = sum(order_qty))

Output

     rn item.x date_fcsted extended_date fcsted_qty actual_total_demand
  <int> <chr>  <date>      <date>             <dbl>               <dbl>
1     1 A      2020-08-01  2020-08-28           100                 221
2     2 B      2020-08-01  2020-08-28           200                 219
3     3 A      2020-08-15  2020-09-11           200                 212
4     4 B      2020-08-15  2020-09-11           100                 216

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 388817

You can try the following :

library(dplyr)

daily_forecast %>%
  left_join(actual_orders, by = 'item') %>%
  filter(order_date >= date_fcsted & order_date <= extended_date) %>%
  group_by(item, date_fcsted, extended_date, fcsted_qty) %>%
  summarise(value = sum(order_qty))

#  item  date_fcsted extended_date fcsted_qty value
#  <chr> <date>      <date>             <dbl> <dbl>
#1 A     2020-08-01  2020-08-28           100   179
#2 A     2020-08-15  2020-09-11           200   148
#3 B     2020-08-01  2020-08-28           200   190
#4 B     2020-08-15  2020-09-11           100   197

Upvotes: 3

Related Questions