Alice Wang
Alice Wang

Reputation: 183

Conditional Left Join in dplyr using subset

I'm trying to perform a conditional left join between two dataframes using dplyr. My first dataframe, df1:

ID  Start Date End Date
1   1-1-2000   4-1-2020

My second dataframe, df2:

ID START_DT END_DT   Name
1  2-3-2002 3-1-2020 John Smith
1  4-2-2004 4-1-2021 Karen Anderson

I'm trying to join on ID where Start Date > START_DT AND End Date <= END_DT

ID START_DT END_DT   Name        Start Date  End Date
1  2-3-2002 3-1-2020 John Smith  1-1-2000    4-1-2020

I've tried this which gives me an error

new_df <- left_join(df2, df1, by = "ID") %>%
  subset(subset(new_df, `Start Date` > START_DT & `End Date` <= END_DT))

I've also tried to just do the join and then subset, which gives me

Warning messages:
1: In eval(e, x, parent.frame()) :
  Incompatible methods (">.POSIXt", "Ops.factor") for ">"
2: In eval(e, x, parent.frame()) :
  Incompatible methods ("<=.POSIXt", "Ops.factor") for "<="

Upvotes: 0

Views: 669

Answers (1)

Joshua Mire
Joshua Mire

Reputation: 736

As far as I could tell there were three issues with your code: you used subset() twice, referenced new_df in your dplyr pipe which is unnecessary and will cause issues, and your dates were not in R's default format.

The code below should work for you and hopefully clarify where your errors were.

new_df <- left_join(df2, df1, by = "ID") %>%
  subset(as.Date(Start Date, "%m-%d-%Y") > as.Date(START_DT, "%m-%d-%Y") & as.Date(End Date, "%m-%d-%Y") <= as.Date(END_DT, "%m-%d-%Y"))

Another alternative is to use filter():

new_df <- left_join(df2, df1, by = "ID") %>%
  filter(as.Date(Start Date, "%m-%d-%Y") > as.Date(START_DT, "%m-%d-%Y") & as.Date(End Date, "%m-%d-%Y") <= as.Date(END_DT, "%m-%d-%Y"))

I hope this helps!

Upvotes: 1

Related Questions