George
George

Reputation: 347

Finding character duplicates in R

I have two columns in R, first.name and last.name. I want to find people who've entered their first or last name twice. I wrote this to find exact equality --

data.frame[df$first.name == df$last.name,].   

However, those only return exact equality between rows. For example, people who wrote "George King" in the first.name field and "George King" in the last.name field. But I also want to find someone who wrote "George" in the first.name column and "George King" in the last name column.

I need some sort of function that looks for a row containing similar characters, but not necessarily the exact same characters.

Upvotes: 0

Views: 849

Answers (1)

generic_user
generic_user

Reputation: 3562

Have a look at the grep family of functions

1> x = c('George', 'George King')
1> grepl(x[1],x[2])
[1] TRUE

?grepl

Note that in a data frame you'd need to run these line-by-line, because grepl doesn't take vector input. Use some sort of apply strategy

Upvotes: 2

Related Questions