Stataq
Stataq

Reputation: 2307

How can I merge two dataframes together with some conditional requirements?

I have two dataframes, df1 and df2. I would like to join the two with the following conditions:

  1. merge df1 and df2 on gender and Test
  2. TestDate in df1 need to be within Date1 and Date2 from df2
  3. all.x = TRUE (keep df1 records)

How can I handle the second part?

Enter image description here

df1 <- structure(list(ID = c(1, 2, 3, 5, 4), Gender = c("F", "M", "M", 
"F", "F"), TestDate = structure(c(17897, 17898, 18630, 18262, 
17900), class = "Date"), Test = c("Weight", "Weight", "ELA", 
"ELA", "Math")), row.names = c(NA, -5L), class = c("tbl_df", 
"tbl", "data.frame"))


df2 <- structure(list(Test = c("Weight", "Weight", "ELA", "ELA", "ELA", 
"ELA", "Math", "Math"), Gender = c("F", "M", "F", "M", "F", "M", 
"F", "M"), Date1 = structure(c(17532, 17534, 17536, 17537, 18266, 
18267, 17897, 17539), class = "Date"), Ave = c(97, 99, 85, 84, 
83, 82, 88, 89), Date2 = structure(c(18993, 18995, 18266, 18267, 
18997, 18998, 18999, 19000), class = "Date")), row.names = c(NA, 
-8L), class = c("tbl_df", "tbl", "data.frame"))

Upvotes: 12

Views: 1311

Answers (4)

Rui Barradas
Rui Barradas

Reputation: 76651

Here is a tidyverse solution.

library(tidyverse)

inner_join(df1, df2) %>%
  filter(TestDate >= Date1 & TestDate <= Date2)
#> Joining, by = c("Gender", "Test")
#> # A tibble: 5 x 7
#>      ID Gender TestDate   Test   Date1        Ave Date2
#>   <dbl> <chr>  <date>     <chr>  <date>     <dbl> <date>
#> 1     1 F      2019-01-01 Weight 2018-01-01    97 2022-01-01
#> 2     2 M      2019-01-02 Weight 2018-01-03    99 2022-01-03
#> 3     3 M      2021-01-03 ELA    2020-01-06    82 2022-01-06
#> 4     5 F      2020-01-01 ELA    2018-01-05    85 2020-01-05
#> 5     4 F      2019-01-04 Math   2019-01-01    88 2022-01-07

Created on 2022-03-21 by the reprex package.

Upvotes: 7

Kevin Dialdestoro
Kevin Dialdestoro

Reputation: 306

If your condition #3 takes priority over condition #2, you may have to do a left join and check for missing value.

E.g., suppose we modify the last entry of df1:

df1[df1[,"ID"]==4, "Test"] <- "Chemistry"
df1

Enter image description here

Then, if we want condition #3 to take priority over condition #2:

left_join(df1, df2, by = c("Test", "Gender")) %>%
    filter((TestDate >= Date1 & TestDate <= Date2) | is.na(Ave)) %>%
    select(-c(Date1, Date2))

Enter image description here

Upvotes: 4

akrun
akrun

Reputation: 887851

We may use non-equi join

library(data.table)
setDT(df2)[df1, on = .(Gender, Test,  Date1 <= TestDate,  Date2 >= TestDate)]

-output

   Test Gender      Date1   Ave      Date2    ID
   <char> <char>     <Date> <num>     <Date> <num>
1: Weight      F 2019-01-01    97 2019-01-01     1
2: Weight      M 2019-01-02    99 2019-01-02     2
3:    ELA      M 2021-01-03    82 2021-01-03     3
4:    ELA      F 2020-01-01    85 2020-01-01     5
5:   Math      F 2019-01-04    88 2019-01-04     4

Upvotes: 7

VvdL
VvdL

Reputation: 3230

Does this work for you?

library(dplyr)
library(data.table)
merge(x = df1, 
      y = df2) %>% 
  filter(TestDate %between% list(Date1, Date2))

Upvotes: 7

Related Questions