Reputation: 125
I have the following two data.frames
df1 <- data.frame(name = "RANDI FIRAT CAYLIOGLU", correct = 30)
df2 <- data.frame(name = "FIRAT CAYLIOGLU", id = 01)
Some people have three names with a middle name, and sometimes use their first name, sometimes use their second name. To my experience regex_join
function of the fuzzyjoin
package does not capture such a partial matching. How can I merge such two datasets using partially matching names?
Upvotes: 0
Views: 82
Reputation: 11981
If it is always the first name which is causing the problem you can use regex to get rid of it. Note that I convert all factors to characters first.
df1 <- data.frame(name="RANDI FIRAT CAYLIOGLU", correct = 30, stringsAsFactors = F)
df2 <- data.frame(name="FIRAT CAYLIOGLU",id = 01, stringsAsFactors = F)
libray(dpylr)
df1%>%
mutate(name2 = sub("^[A-Za-z]+ ", "", name)) %>%
full_join(df2, by = c("name2" = "name"))
name correct name2 id
1 RANDI FIRAT CAYLIOGLU 30 FIRAT CAYLIOGLU 1
If it also can be the middle name you can create an additional column name3
which only contain first and last names:
libray(dpylr)
df1%>%
mutate(name2 = sub("^[A-Za-z]+ ", "", name),
name3 = sub(" [A-Za-z]+ ", " ", name) %>%
left_join(df2, by = c("name2" = "name")) %>%
left_join(df2, by = c("name3" = "name"))
Here, name2
is the just middle name and last name and name3
contains first name and last name.
Upvotes: 1