Reputation: 81
I want to merge two data frames df1 and df2.
df1<-tibble(x=c("FIDELITY FREEDOM 2015 FUND", "VANGUARD WELLESLEY INCOME FUND"),y=c(1,2))
df2<-tibble(x=c("FIDELITY ABERDEEN STREET TRUST: FIDELITY FREEDOM 2015 FUND", "VANGUARD/WELLESLEY INCOME FUND, INC: VANGUARD WELLESLEY INCOME FUND; INVESTOR SHARES"),z=c(2020,2021))
I want to merge df1 and df2 based on x. Currently, I try fuzzy matching and use
fuzzy_join(df1,df2,match_fun = function(x,y) grepl(x, y))
It gives me the output as follows,
In grepl(x, y) :
argument 'pattern' has length > 1 and only the first element will be used.
Do you have any ideas for merging df1 and df2? I am thinking about how to write the match_fun function but I am not sure how to progress. Thank you so much!
Upvotes: 2
Views: 576
Reputation: 79204
We could either use fuzzy_inner_join
or regex_inner_join
from fuzzyjoin
package.
library(fuzzyjoin)
library(stringr)
df2 %>% fuzzy_inner_join(df1, by = "x", match_fun = str_detect)
x.x z x.y y
<chr> <dbl> <chr> <dbl>
1 FIDELITY ABERDEEN STREET TRUST: FIDELITY FREEDOM 2015 FUND 2020 FIDELITY FREEDOM 2015 FUND 1
2 VANGUARD/WELLESLEY INCOME FUND, INC: VANGUARD WELLESLEY INCOME FUND; INVESTOR SHARES 2021 VANGUARD WELLESLEY INCOME FUND 2
or:
library(fuzzyjoin)
df2 %>% regex_inner_join(df1, by = "x")
Upvotes: 1