Reputation: 223
I want to join two dataframes with index and Year as long as the Year on the RHS is 1-3 years after the Year on the LHS. For example, dataframe df_lhs
is
A index Year
1 A 12/31/2012
3 B 12/31/2011
5 C 12/31/2009
the df_rhs
is
B index Year
5 A 12/31/2001
6 B 12/31/2010
2 C 12/31/2011
I hope the resulting inner_join to contain:
A index Year_left Year_right
5 C 12/31/2009 12/31/2011
This is what I tried
df = inner_join(df_lhs, df_rhs, by = c('index','Year'), suffix = c(".left", ".right"))
The code doesn't work. Maybe I should not think about using inner_join
at all?
Upvotes: 1
Views: 42
Reputation: 29134
library(dplyr)
library(tidyr)
df_lhs %>%
separate(Year, sep = "/", into = c("m", "d", "y"), remove = F) %>%
inner_join(., {df_rhs%>%
separate(Year, sep = "/", into = c("m", "d", "y"), remove = F)},
by = c('index','m', 'd'), suffix = c(".left", ".right")) %>%
filter((as.numeric(y.right) - as.numeric(y.left)) %in% 1:3) %>%
select(A, B, index, Year.left, Year.right)
#> A B index Year.left Year.right
#> 1 5 2 C 12/31/2009 12/31/2011
Upvotes: 3
Reputation: 1611
What you can do is do a simple join/merge, and then filter out the rows which satisfy your condition (here 1-3 years).
Below is the code for merging two data frames based on multiple IDs.
merge(df_lhs,data df_rhs,by=c("index","Year"))
After this you will get simple merge and then you can filter based on some condition like difference of dates between 1-3 years.
This is just a suggestion. I hope this helps.
Upvotes: 1