check eligible IDs based on conditions in another table with dplyr

Question

I have table 1 with customer ids. For each customer, there is a start and end date (validity period of the promotion). Table 1 looks like below:

ID	start_date	end_date
1	2022-01-01	2022-01-03
2	2022-10-15	2022-10-18
3	2022-02-03	2022-02-05

The goal is to check from these customers, those who did a purchase during the interval of time between start and end_date. For this, we need to check the table of all purchases, which looks like this:

table 2:

id	date_purchase
1	2022-01-05
1	2022-01-02
7	2022-01-01
8	2022-01-01

Our final result should label only id=1 as someone who participated in the campaign, because the purchase date was during the start_date and end_date relative to this customer (row2 in table 2)

The pseudo code I am trying for this is the following: For each ID in table 1, check if it exists in table2. If yes, check if there is any row having date purchase associated to this id, that is between start_date and end_date related to this customer. If yes, flag it with mutate (as beneficiary for example).

Is there a simple way to do this using dplyr?

langtang · Accepted Answer

If df1 and df2 are your dataframe as above, then you can do this with dplyr like this, but I would not recommend it.

library(dplyr)
inner_join(df1,df2, c("ID"="id")) %>% 
  filter(date_purchase>=start_date, date_purchase<=end_date)

The appropriate way to do this is with a non-equi join, which can be done with data.table

library(data.table)
setDT(df1)
setDT(df2)
df2[df1,on=.(id=ID, date_purchase>=start_date, date_purchase<=end_date), nomatch=0]

check eligible IDs based on conditions in another table with dplyr

Answers (1)

Related Questions