How to match strings between two columns in R?

Question

I want to create a new column (MATCH) on the basis of string match between two existing columns. For example -

st_add	aa_add	MATCH
jai maa durga society	jai maa durga colony	MATCH
elph road highway 1	road highway 2 elph	MATCH
srinivan colony parel ist	srinivan bus depot	NOT MATCH

If there is a match in three or more words between column 1 and column 2 then then column 3(MATCH) should show "MATCH". But if there is less than 3 words matches or no match at all (example row 3) then the result should be "NO MATCH"

How can I do this using R??

Ronak Shah · Accepted Answer

You can split the data into words in st_add and aa_add count the number of common words, if they are greater than equal to 3 assign 'MATCH' to it.

df$MATCH <- ifelse(mapply(function(x, y) length(intersect(x, y)), 
                strsplit(df$st_add, '\s+'),
                strsplit(df$aa_add, '\s+')) >= 3, 'MATCH', 'NOT MATCH')
df

#                     st_add               aa_add     MATCH
#1     jai maa durga society jai maa durga colony     MATCH
#2       elph road highway 1  road highway 2 elph     MATCH
#3 srinivan colony parel ist   srinivan bus depot NOT MATCH

data

df <- structure(list(st_add = c("jai maa durga society", "elph road highway 1", 
"srinivan colony parel ist"), aa_add = c("jai maa durga colony", 
"road highway 2 elph", "srinivan bus depot")), row.names = c(NA, 
-3L), class = "data.frame")

How to match strings between two columns in R?

Answers (2)

Related Questions