Reputation: 183
I'm trying to perform a conditional left join between two dataframes using dplyr. My first dataframe, df1:
ID Start Date End Date
1 1-1-2000 4-1-2020
My second dataframe, df2:
ID START_DT END_DT Name
1 2-3-2002 3-1-2020 John Smith
1 4-2-2004 4-1-2021 Karen Anderson
I'm trying to join on ID where Start Date > START_DT AND End Date <= END_DT
ID START_DT END_DT Name Start Date End Date
1 2-3-2002 3-1-2020 John Smith 1-1-2000 4-1-2020
I've tried this which gives me an error
new_df <- left_join(df2, df1, by = "ID") %>%
subset(subset(new_df, `Start Date` > START_DT & `End Date` <= END_DT))
I've also tried to just do the join and then subset, which gives me
Warning messages:
1: In eval(e, x, parent.frame()) :
Incompatible methods (">.POSIXt", "Ops.factor") for ">"
2: In eval(e, x, parent.frame()) :
Incompatible methods ("<=.POSIXt", "Ops.factor") for "<="
Upvotes: 0
Views: 669
Reputation: 736
As far as I could tell there were three issues with your code: you used subset()
twice, referenced new_df
in your dplyr
pipe which is unnecessary and will cause issues, and your dates were not in R's default format.
The code below should work for you and hopefully clarify where your errors were.
new_df <- left_join(df2, df1, by = "ID") %>%
subset(as.Date(Start Date, "%m-%d-%Y") > as.Date(START_DT, "%m-%d-%Y") & as.Date(End Date, "%m-%d-%Y") <= as.Date(END_DT, "%m-%d-%Y"))
Another alternative is to use filter()
:
new_df <- left_join(df2, df1, by = "ID") %>%
filter(as.Date(Start Date, "%m-%d-%Y") > as.Date(START_DT, "%m-%d-%Y") & as.Date(End Date, "%m-%d-%Y") <= as.Date(END_DT, "%m-%d-%Y"))
I hope this helps!
Upvotes: 1