Is it possible to merge based on many conditions?

Question

What I'd like to achieve is to be able to compare the data based on a date and if the date is between the range then take the lowest "PDF2" value.

This is an example of the two dataframe I'm working with. I want to check if the data of the column "R" from "df" is found in the column "R" of "df2" check if the date is between the range of df2 and if there is any conflict or duplicate I want to always keep the lowest value of "PDF2".

df <- data.frame("D" = c("01/01/2019", "01/02/2019", "01/03/2019", "01/12/2019"),
             "R" = c("ABC123", "ABC123", "ABC123", "ABC1"),
             "PDF" = c(1.23, 1.23, 1.23, 1.23),
             stringsAsFactors = FALSE)

df2 <- data.frame("DD" = c("01/01/2019", "01/02/2019", "01/01/2019"),
              "DF" = c("01/02/2019", "01/03/2019", "01/11/2019"),
              "R" = c("ABC123", "ABC123", "ABC1"),
              "PDF2" = c(1.12, 1.11, 1.12),
              stringsAsFactors = FALSE)

This is the result I'm expecting.

result <- data.frame("R" = c("ABC123", "ABC123", "ABC123"),
                 "D" = c("01/01/2019", "01/02/2019", "01/03/2019"),
                 "DD" = c("01/01/2019", "01/02/2019", "01/02/2019"),
                 "DF" = c("01/02/2019", "01/03/2019", "01/03/2019"),
                 "PDF" = c(1.23, 1.23, 1.23),
                 "PDF2" = c(1.12, 1.11, 1.11),
                 stringsAsFactors = FALSE)

As you can see the "ABC1" isn't in the result because the date isn't in the range.

My current problem is to only keep the lowest value if there are duplicate or conflict for the date range.

Here is a sample of my current code :

temp <- merge(df, df2, by = "R")
myd <- which(as.Date(temp$D, format = "%d/%m/%Y") <= as.Date(temp$DF, format = "%d/%m/%Y"))
myd2 <- which(as.Date(temp$D, format = "%d/%m/%Y") >= as.Date(temp$DD, format = "%d/%m/%Y"))
myd <- myd[myd %in% myd2]
if (length(myd)) {
  temp <- temp[myd,]
}

And also how can I get the lines that didn't match the requirements in a separate dataframe ?

Jeroen Colin · Accepted Answer

I think the answers to this question might help you:

How to find matches for a row in a dataframe conditional on many rows from another dataframe

df %>% 
  left_join(df2, by = "R") %>% 
  filter(lubridate::dmy(D) >= lubridate::dmy(DD) & lubridate::dmy(D) <= lubridate::dmy(DF)) %>% 
  group_by(R,D) %>% 
  filter(PDF2 == min(PDF2)) %>% 
  ungroup()

Is it possible to merge based on many conditions?

Answers (2)

Related Questions