What is the best method for fuzzy matching all elements of a single vector or column against all the elements within that same vector or column?

Question

For example, if I had a data.frame such as

df <- data.frame(Name = 'Chris','Christopher','John','Jon','Jonathan')

Is there a way for me to build a similarity matrix comparing how similar each individual name is to every other name in the 'Name' column?

I've tried using loop but not really sure how to apply this across the entire column

for(i in 1:nrow(df)){
  df$distance[i] <- adist(df$Name[i], df$Name[i+1])
}

markhogue · Accepted Answer

I got @zephryl 's solution to work with some minor edits.

df <- data.frame('Name' = c('Chris','Christopher','John','Jon','Jonathan'))

distances <- adist(df$Name)
distances <- as.data.frame(distances)
rownames(distances) <- df$Name
colnames(distances) <- df$Name

distances

What is the best method for fuzzy matching all elements of a single vector or column against all the elements within that same vector or column?

Answers (2)

Related Questions