Reputation: 2307
I have two dataframes, df1
and df2
. I would like to join the two with the following conditions:
gender
and Test
TestDate
in df1
need to be within Date1
and Date2
from df2
all.x = TRUE
(keep df1 records)How can I handle the second part?
df1 <- structure(list(ID = c(1, 2, 3, 5, 4), Gender = c("F", "M", "M",
"F", "F"), TestDate = structure(c(17897, 17898, 18630, 18262,
17900), class = "Date"), Test = c("Weight", "Weight", "ELA",
"ELA", "Math")), row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame"))
df2 <- structure(list(Test = c("Weight", "Weight", "ELA", "ELA", "ELA",
"ELA", "Math", "Math"), Gender = c("F", "M", "F", "M", "F", "M",
"F", "M"), Date1 = structure(c(17532, 17534, 17536, 17537, 18266,
18267, 17897, 17539), class = "Date"), Ave = c(97, 99, 85, 84,
83, 82, 88, 89), Date2 = structure(c(18993, 18995, 18266, 18267,
18997, 18998, 18999, 19000), class = "Date")), row.names = c(NA,
-8L), class = c("tbl_df", "tbl", "data.frame"))
Upvotes: 12
Views: 1311
Reputation: 76651
Here is a tidyverse solution.
library(tidyverse)
inner_join(df1, df2) %>%
filter(TestDate >= Date1 & TestDate <= Date2)
#> Joining, by = c("Gender", "Test")
#> # A tibble: 5 x 7
#> ID Gender TestDate Test Date1 Ave Date2
#> <dbl> <chr> <date> <chr> <date> <dbl> <date>
#> 1 1 F 2019-01-01 Weight 2018-01-01 97 2022-01-01
#> 2 2 M 2019-01-02 Weight 2018-01-03 99 2022-01-03
#> 3 3 M 2021-01-03 ELA 2020-01-06 82 2022-01-06
#> 4 5 F 2020-01-01 ELA 2018-01-05 85 2020-01-05
#> 5 4 F 2019-01-04 Math 2019-01-01 88 2022-01-07
Created on 2022-03-21 by the reprex package.
Upvotes: 7
Reputation: 306
If your condition #3 takes priority over condition #2, you may have to do a left join and check for missing value.
E.g., suppose we modify the last entry of df1
:
df1[df1[,"ID"]==4, "Test"] <- "Chemistry"
df1
Then, if we want condition #3 to take priority over condition #2:
left_join(df1, df2, by = c("Test", "Gender")) %>%
filter((TestDate >= Date1 & TestDate <= Date2) | is.na(Ave)) %>%
select(-c(Date1, Date2))
Upvotes: 4
Reputation: 887851
We may use non-equi join
library(data.table)
setDT(df2)[df1, on = .(Gender, Test, Date1 <= TestDate, Date2 >= TestDate)]
-output
Test Gender Date1 Ave Date2 ID
<char> <char> <Date> <num> <Date> <num>
1: Weight F 2019-01-01 97 2019-01-01 1
2: Weight M 2019-01-02 99 2019-01-02 2
3: ELA M 2021-01-03 82 2021-01-03 3
4: ELA F 2020-01-01 85 2020-01-01 5
5: Math F 2019-01-04 88 2019-01-04 4
Upvotes: 7
Reputation: 3230
Does this work for you?
library(dplyr)
library(data.table)
merge(x = df1,
y = df2) %>%
filter(TestDate %between% list(Date1, Date2))
Upvotes: 7