Sharif Amlani
Sharif Amlani

Reputation: 1278

How to merge data frame based on date between two intervals of time?

I'm working with two data frames. Data frame 1 includes a date, I'll refer to it as the merge_date. Data frame 2 includes two dates, start date and end date.

I want to merge data frame 1 and 2 together based their ID column and on whether the merge_date is between the start and end date in Data frame 2.

For example:

############## Make Data Frame 1 #############
ID <- c(2,4,6,8,10)
DF_1_Start_Date <- as.Date(c("1912-01-01", "1945-09-20", "1934-07-01", "1967-12-23", "1949-05-19"), tryFormats = c("%Y-%m-%d"))

DF_1 <- data.frame(ID, merge_date = DF_1_Start_Date)

############## Make Data Frame 2 #############

ID <- c(2,4,6,8,10)
DF_2_Start_Date <- as.Date(c("1911-01-01", "1944-09-20", "1933-07-01", "1963-12-23", "1948-05-19"), tryFormats = c("%Y-%m-%d"))

DF_2_End_Date <- as.Date(c("1913-01-01", "1946-09-20", "1935-07-01", "1970-12-23", "1952-05-19"), tryFormats = c("%Y-%m-%d"))

DF_2 <- data.frame(ID, interval_start = DF_2_Start_Date, interval_end = DF_2_End_Date)

########### Data Frame 1 and 2 ################
> DF_1
  ID merge_date
1  2 1912-01-01
2  4 1945-09-20
3  6 1934-07-01
4  8 1967-12-23
5 10 1949-05-19
> DF_2
  ID interval_start interval_end
1  2     1911-01-01   1913-01-01
2  4     1944-09-20   1946-09-20
3  6     1933-07-01   1935-07-01
4  8     1963-12-23   1970-12-23
5 10     1948-05-19   1952-05-19

I would like the data frame to look like:

DF_3
  ID merge_date   interval_start interval_end
1  2 1912-01-01   1911-01-01     1913-01-01
2  4 1945-09-20   1944-09-20     1946-09-20
3  6 1934-07-01   1933-07-01     1935-07-01
4  8 1967-12-23   1963-12-23     1970-12-23
5 10 1949-05-19   1948-05-19     1952-05-19

So that the IDs align and the merge_date merges between the interval_start and the interval_end dates.

Thank you!

Upvotes: 1

Views: 508

Answers (5)

Matt
Matt

Reputation: 2987

Using sqldf you could do:

library(sqldf)

sqldf("select * from DF_2 join DF_1 on merge_date >= interval_start and merge_date <= interval_end")

Or as mentioned by @G.Grothendieck, more directly you could do:

sqldf("select * from DF_2 Join DF_1 on merge_date between interval_start and interval_end")

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388817

We can use fuzzy_left_join/fuzzy_inner_join from fuzzyjoin package

fuzzyjoin::fuzzy_left_join(DF_1, DF_2, by = c('ID' = 'ID', 
    'merge_date' = 'interval_start', 'merge_date' = 'interval_end'), 
     match_fun = list(`==`, `>=`, `<=`))

#  ID.x merge_date ID.y interval_start interval_end
#1    2 1912-01-01    2     1911-01-01   1913-01-01
#2    4 1945-09-20    4     1944-09-20   1946-09-20
#3    6 1934-07-01    6     1933-07-01   1935-07-01
#4    8 1967-12-23    8     1963-12-23   1970-12-23
#5   10 1949-05-19   10     1948-05-19   1952-05-19

Upvotes: 0

AlexB
AlexB

Reputation: 3269

Another option would be:

library(dplyr)
DF_1 %>%
  left_join(DF_2, by = 'ID') %>%
  filter(merge_date >= interval_start & merge_date <= interval_end) -> DF_3

# ID merge_date interval_start interval_end
# 1  2 1912-01-01     1911-01-01   1913-01-01
# 2  4 1945-09-20     1944-09-20   1946-09-20
# 3  6 1934-07-01     1933-07-01   1935-07-01
# 4  8 1967-12-23     1963-12-23   1970-12-23
# 5 10 1949-05-19     1948-05-19   1952-05-19

Upvotes: 0

ThomasIsCoding
ThomasIsCoding

Reputation: 101099

Here is a base R solution, using merge() and subset()

DF_3 <- subset(merge(DF_1,DF_2), 
               merge_date >= interval_start & merge_date <= interval_end)

such that

> DF_3
  ID merge_date interval_start interval_end
1  2 1912-01-01     1911-01-01   1913-01-01
2  4 1945-09-20     1944-09-20   1946-09-20
3  6 1934-07-01     1933-07-01   1935-07-01
4  8 1967-12-23     1963-12-23   1970-12-23
5 10 1949-05-19     1948-05-19   1952-05-19

Upvotes: 2

akrun
akrun

Reputation: 886948

We can use a non-equi join

library(data.table)
DF_3 <- copy(DF_2)
setDT(DF_3)[DF_1,  merge_date := merge_date, 
       on = .(ID, interval_start < merge_date, interval_end > merge_date)][]

Upvotes: 1

Related Questions