Match similar names

Question

I have a database with three columns: name, occupation, and organization. In these columns, I have duplicates with slightly different names. For example, Anne Sue Frank and Anne S. Frank refer to the same person, as they have the same occupation and belong to the same organization.

Is there a way for me to create another table that maps these names to their corresponding matches? I tried using stringdist, but it mostly matches names with themselves (e.g., Anne Sue Frank with Anne Sue Frank that are the same row), which I don't want. My objetive is finding the duplicate names, so I would need a new table showing the name in a column with their correspondet in the second column.

Example:

df1 <- data.frame(
  name = c("Anne Sue Frank", "John S. Gooble", "Anne S. Frank", "Johnatan Sue Google"),
  organization = c("ABC", "FCV", "ABC", "FCV"), 
  occupation = c("director", "teacher", "director", "teacher"),
  stringsAsFactors = FALSE
)

df1

name organization occupation
1      Anne Sue Frank          ABC   director
2      John S. Gooble          FCV    teacher
3       Anne S. Frank          ABC   director
4 Johnatan Sue Google          FCV    teacher

Match similar names

Answers (1)

Related Questions