Merge dataframes based on interval condition

Question

I have a dataframe like this

id start        end
1  20/06/88     24/07/89
1  27/07/89     13/04/93
1  14/04/93     6/09/95
2  3/01/92      11/02/94
2  30/03/94     16/04/96
2  17/04/96     18/08/97

that I would like to merge with this other dataframe

id date
1  26/08/88   
2  10/05/96

The resulting merged dataframe should look like this

id start        end         date
1  20/06/88     24/07/89    26/06/88
1  27/07/89     13/04/93    NA
1  14/04/93     6/09/95     NA
2  3/01/92      11/02/94    NA
2  30/03/94     16/04/96    NA
2  17/04/96     18/08/97    10/05/96

In practice I want to merge the two dataframes based on id and on the fact that date must lie within the interval spanned by the start and end vars of the first dataframe.

Do you have any suggestion on how to do this? I tried to use the fuzzyjoin package, but I have some memory issue..

Many thanks to everyone

Wietze314 · Accepted Answer

You can use sqldf for complex joins:


require(sqldf)

sqldf("SELECT df1.*,df2.date,df2.id as id2
      FROM df1
      LEFT JOIN df2 
      ON df1.id = df2.id AND
      df1.start < df2.date AND
      df1.end > df2.date")

Merge dataframes based on interval condition

Answers (2)

Sample data

Related Questions