Warrior
Warrior

Reputation: 223

Inner_join two dataframes when year on the RHS is 1-3 years after the Year on the LHS

I want to join two dataframes with index and Year as long as the Year on the RHS is 1-3 years after the Year on the LHS. For example, dataframe df_lhs is

A   index   Year
1   A   12/31/2012
3   B   12/31/2011
5   C   12/31/2009

the df_rhs is

B   index   Year
5   A   12/31/2001
6   B   12/31/2010
2   C   12/31/2011

I hope the resulting inner_join to contain:

A   index   Year_left   Year_right
5   C   12/31/2009  12/31/2011

This is what I tried

df = inner_join(df_lhs, df_rhs, by = c('index','Year'), suffix = c(".left", ".right"))

The code doesn't work. Maybe I should not think about using inner_join at all?

Upvotes: 1

Views: 42

Answers (2)

M--
M--

Reputation: 29134

library(dplyr)
library(tidyr)

df_lhs %>% 
  separate(Year, sep = "/", into = c("m", "d", "y"), remove = F) %>% 
  inner_join(., {df_rhs%>% 
                   separate(Year, sep = "/", into = c("m", "d", "y"), remove = F)}, 
             by = c('index','m', 'd'), suffix = c(".left", ".right")) %>% 
  filter((as.numeric(y.right) - as.numeric(y.left)) %in% 1:3) %>% 
  select(A, B, index, Year.left, Year.right)

#>   A B index  Year.left Year.right
#> 1 5 2     C 12/31/2009 12/31/2011

Upvotes: 3

ashwin agrawal
ashwin agrawal

Reputation: 1611

What you can do is do a simple join/merge, and then filter out the rows which satisfy your condition (here 1-3 years).

Below is the code for merging two data frames based on multiple IDs.

merge(df_lhs,data df_rhs,by=c("index","Year"))

After this you will get simple merge and then you can filter based on some condition like difference of dates between 1-3 years.

This is just a suggestion. I hope this helps.

Upvotes: 1

Related Questions