Enes I.
Enes I.

Reputation: 125

Merge two data.frame using partially matching strings

I have the following two data.frames

df1 <- data.frame(name = "RANDI FIRAT CAYLIOGLU", correct = 30)
df2 <- data.frame(name = "FIRAT CAYLIOGLU", id = 01)

Some people have three names with a middle name, and sometimes use their first name, sometimes use their second name. To my experience regex_join function of the fuzzyjoin package does not capture such a partial matching. How can I merge such two datasets using partially matching names?

Upvotes: 0

Views: 82

Answers (1)

Cettt
Cettt

Reputation: 11981

If it is always the first name which is causing the problem you can use regex to get rid of it. Note that I convert all factors to characters first.

df1 <- data.frame(name="RANDI FIRAT CAYLIOGLU", correct = 30, stringsAsFactors = F) 
df2 <- data.frame(name="FIRAT CAYLIOGLU",id = 01, stringsAsFactors = F)

libray(dpylr)
df1%>%
  mutate(name2 = sub("^[A-Za-z]+ ", "", name)) %>%
  full_join(df2, by = c("name2" = "name"))

                   name correct           name2 id
1 RANDI FIRAT CAYLIOGLU      30 FIRAT CAYLIOGLU  1

If it also can be the middle name you can create an additional column name3 which only contain first and last names:

libray(dpylr)
    df1%>%
      mutate(name2 = sub("^[A-Za-z]+ ", "", name),
             name3 = sub(" [A-Za-z]+ ", " ", name) %>%
      left_join(df2, by = c("name2" = "name")) %>%
      left_join(df2, by = c("name3" = "name"))

Here, name2 is the just middle name and last name and name3 contains first name and last name.

Upvotes: 1

Related Questions