Jane
Jane

Reputation: 81

Fuzzy matching two data frames

I want to merge two data frames df1 and df2.

df1<-tibble(x=c("FIDELITY FREEDOM 2015 FUND", "VANGUARD WELLESLEY INCOME FUND"),y=c(1,2))
df2<-tibble(x=c("FIDELITY ABERDEEN STREET TRUST: FIDELITY FREEDOM 2015 FUND", "VANGUARD/WELLESLEY INCOME FUND, INC: VANGUARD WELLESLEY INCOME FUND; INVESTOR SHARES"),z=c(2020,2021))

I want to merge df1 and df2 based on x. Currently, I try fuzzy matching and use

fuzzy_join(df1,df2,match_fun = function(x,y) grepl(x, y))

It gives me the output as follows,

In grepl(x, y) :
  argument 'pattern' has length > 1 and only the first element will be used.

Do you have any ideas for merging df1 and df2? I am thinking about how to write the match_fun function but I am not sure how to progress. Thank you so much!

Upvotes: 2

Views: 576

Answers (1)

TarJae
TarJae

Reputation: 79204

We could either use fuzzy_inner_join or regex_inner_join from fuzzyjoin package.

library(fuzzyjoin)
library(stringr)
df2 %>% fuzzy_inner_join(df1, by = "x", match_fun = str_detect)
  x.x                                                                                      z x.y                                y
  <chr>                                                                                <dbl> <chr>                          <dbl>
1 FIDELITY ABERDEEN STREET TRUST: FIDELITY FREEDOM 2015 FUND                            2020 FIDELITY FREEDOM 2015 FUND         1
2 VANGUARD/WELLESLEY INCOME FUND, INC: VANGUARD WELLESLEY INCOME FUND; INVESTOR SHARES  2021 VANGUARD WELLESLEY INCOME FUND     2

or:

library(fuzzyjoin)
df2 %>% regex_inner_join(df1, by = "x")

Upvotes: 1

Related Questions