how to left join/merge in R based on 2 criteria: numerical match and date-range evaluation

Question

I have two data frames: summary and hauled. I want to create a new column in hauled that returns the values from summary$rfp_id based on 2 matching criteria:

Match summary$company_id with hauled$company_id
Compare hauled$trans_date with summary$start_date and summary$end_date and return whichever rfp_id most closely matches based on the below explanation of possible outcomes.

summary dataframe:

rfp_id	start_date	end_date	company_id
1	12/30/2022	2/28/2023	7
2	4/1/2022	6/30/2022	8
3	7/1/2022	8/30/2022	8
4	1/16/2022	1/16/2023	9
5	1/1/2023	2/6/2023	9

hauled dataframe (rfp_id = desired result):

trans#	company_id	trans_date	rfp_id
11	7	1/14/2023	1
12	8	7/2/2022	3
13	8	3/20/2022	2
14	8	9/1/2022	3
15	9	1/15/2023	5

The first example (trans# = 11) returns rfp_id = 1 since company 7 only appears once in summary and hauled$trans_date of 1/14/2023 falls between the start/end dates of 12/30/2022 and 2/28/2023.

The second example (trans# = 12)returns rfp_id = 3 since the trans_date of 7/2/2022 falls between 7/1/2022-8/30/2022 (and not between 4/1/2022-6/30/2022).

The third example (trans# = 13 )returns rfp_id = 2 because the trans_date of 3/20/2022 falls outside both start/end date ranges, however it is closest to the start_date of 4/1/2022

The fourth example (trans# = 14 )returns rfp_id = 3 because the trans_date of 9/1/2022 falls outside both start/end date ranges, however it is closest to the end_date of 8/30/2022.

The fifth example (trans# = 15) returns rfp_id = 5 because when a trans_date falls between 2 or more start/end date ranges then the rfp_id to return is whichever has the latest end_date

I don't have a strong background in R. Most of what I have tried has been from chatgpt. The code it spits out continuously throws off a 'many-to-many' error when executed.

how to left join/merge in R based on 2 criteria: numerical match and date-range evaluation

Answers (1)

Related Questions