Reputation: 475
I have table 1 with customer ids. For each customer, there is a start and end date (validity period of the promotion). Table 1 looks like below:
ID | start_date | end_date |
---|---|---|
1 | 2022-01-01 | 2022-01-03 |
2 | 2022-10-15 | 2022-10-18 |
3 | 2022-02-03 | 2022-02-05 |
The goal is to check from these customers, those who did a purchase during the interval of time between start and end_date. For this, we need to check the table of all purchases, which looks like this:
table 2:
id | date_purchase |
---|---|
1 | 2022-01-05 |
1 | 2022-01-02 |
7 | 2022-01-01 |
8 | 2022-01-01 |
Our final result should label only id=1 as someone who participated in the campaign, because the purchase date was during the start_date and end_date relative to this customer (row2 in table 2)
The pseudo code I am trying for this is the following: For each ID in table 1, check if it exists in table2. If yes, check if there is any row having date purchase associated to this id, that is between start_date and end_date related to this customer. If yes, flag it with mutate (as beneficiary for example).
Is there a simple way to do this using dplyr?
Upvotes: 0
Views: 23
Reputation: 24742
If df1
and df2
are your dataframe as above, then you can do this with dplyr
like this, but I would not recommend it.
library(dplyr)
inner_join(df1,df2, c("ID"="id")) %>%
filter(date_purchase>=start_date, date_purchase<=end_date)
The appropriate way to do this is with a non-equi join, which can be done with data.table
library(data.table)
setDT(df1)
setDT(df2)
df2[df1,on=.(id=ID, date_purchase>=start_date, date_purchase<=end_date), nomatch=0]
Upvotes: 1