Reputation: 1278
I'm working with two data frames. Data frame 1 includes a date, I'll refer to it as the merge_date
. Data frame 2 includes two dates, start date and end date.
I want to merge data frame 1 and 2 together based their ID
column and on whether the merge_date
is between the start and end date in Data frame 2.
For example:
############## Make Data Frame 1 #############
ID <- c(2,4,6,8,10)
DF_1_Start_Date <- as.Date(c("1912-01-01", "1945-09-20", "1934-07-01", "1967-12-23", "1949-05-19"), tryFormats = c("%Y-%m-%d"))
DF_1 <- data.frame(ID, merge_date = DF_1_Start_Date)
############## Make Data Frame 2 #############
ID <- c(2,4,6,8,10)
DF_2_Start_Date <- as.Date(c("1911-01-01", "1944-09-20", "1933-07-01", "1963-12-23", "1948-05-19"), tryFormats = c("%Y-%m-%d"))
DF_2_End_Date <- as.Date(c("1913-01-01", "1946-09-20", "1935-07-01", "1970-12-23", "1952-05-19"), tryFormats = c("%Y-%m-%d"))
DF_2 <- data.frame(ID, interval_start = DF_2_Start_Date, interval_end = DF_2_End_Date)
########### Data Frame 1 and 2 ################
> DF_1
ID merge_date
1 2 1912-01-01
2 4 1945-09-20
3 6 1934-07-01
4 8 1967-12-23
5 10 1949-05-19
> DF_2
ID interval_start interval_end
1 2 1911-01-01 1913-01-01
2 4 1944-09-20 1946-09-20
3 6 1933-07-01 1935-07-01
4 8 1963-12-23 1970-12-23
5 10 1948-05-19 1952-05-19
I would like the data frame to look like:
DF_3
ID merge_date interval_start interval_end
1 2 1912-01-01 1911-01-01 1913-01-01
2 4 1945-09-20 1944-09-20 1946-09-20
3 6 1934-07-01 1933-07-01 1935-07-01
4 8 1967-12-23 1963-12-23 1970-12-23
5 10 1949-05-19 1948-05-19 1952-05-19
So that the IDs align and the merge_date
merges between the interval_start
and the interval_end
dates.
Thank you!
Upvotes: 1
Views: 508
Reputation: 2987
Using sqldf
you could do:
library(sqldf)
sqldf("select * from DF_2 join DF_1 on merge_date >= interval_start and merge_date <= interval_end")
Or as mentioned by @G.Grothendieck, more directly you could do:
sqldf("select * from DF_2 Join DF_1 on merge_date between interval_start and interval_end")
Upvotes: 1
Reputation: 388817
We can use fuzzy_left_join
/fuzzy_inner_join
from fuzzyjoin
package
fuzzyjoin::fuzzy_left_join(DF_1, DF_2, by = c('ID' = 'ID',
'merge_date' = 'interval_start', 'merge_date' = 'interval_end'),
match_fun = list(`==`, `>=`, `<=`))
# ID.x merge_date ID.y interval_start interval_end
#1 2 1912-01-01 2 1911-01-01 1913-01-01
#2 4 1945-09-20 4 1944-09-20 1946-09-20
#3 6 1934-07-01 6 1933-07-01 1935-07-01
#4 8 1967-12-23 8 1963-12-23 1970-12-23
#5 10 1949-05-19 10 1948-05-19 1952-05-19
Upvotes: 0
Reputation: 3269
Another option would be:
library(dplyr)
DF_1 %>%
left_join(DF_2, by = 'ID') %>%
filter(merge_date >= interval_start & merge_date <= interval_end) -> DF_3
# ID merge_date interval_start interval_end
# 1 2 1912-01-01 1911-01-01 1913-01-01
# 2 4 1945-09-20 1944-09-20 1946-09-20
# 3 6 1934-07-01 1933-07-01 1935-07-01
# 4 8 1967-12-23 1963-12-23 1970-12-23
# 5 10 1949-05-19 1948-05-19 1952-05-19
Upvotes: 0
Reputation: 101099
Here is a base R solution, using merge()
and subset()
DF_3 <- subset(merge(DF_1,DF_2),
merge_date >= interval_start & merge_date <= interval_end)
such that
> DF_3
ID merge_date interval_start interval_end
1 2 1912-01-01 1911-01-01 1913-01-01
2 4 1945-09-20 1944-09-20 1946-09-20
3 6 1934-07-01 1933-07-01 1935-07-01
4 8 1967-12-23 1963-12-23 1970-12-23
5 10 1949-05-19 1948-05-19 1952-05-19
Upvotes: 2
Reputation: 886948
We can use a non-equi join
library(data.table)
DF_3 <- copy(DF_2)
setDT(DF_3)[DF_1, merge_date := merge_date,
on = .(ID, interval_start < merge_date, interval_end > merge_date)][]
Upvotes: 1