Reputation: 2876
I have a reference table where I have peoples Names and attributes (Bad Emails)
I wish to subset the table based on peoples names below
I have tried two ways to deal with this
# Returns nothing
subset(bad.email, User.Name %in% c('John'))
filter(bad.email, User.Name %in% 'John')
# returns what im looking for
subset(bad.email, grepl("John", User.Name ))
filter(bad.email, grepl("John", User.Name ))
Can anyone explain why this may be the case?
In the end i want to replace john with a column from a reference table but i just wish to wrap my head around the concept first
Upvotes: 1
Views: 128
Reputation: 2876
Based on feedback from @akrun and @David Arenburg
The reason the first two lines of code potentially failing to work is there may be white space in the field. White space can be removed by setting the strip.white=TRUE when initially reading the file in. Alternatively using the str_trim(col_name) from the stringr library will also do it
The second reason is the main reason my code didn't work which was the %in% matches exactly the pattern you are looking for.
In my case there were surnames in the field and i was only matching on first name. To fix this either match by the whole name or match using grepl. grepl is case sensitive so adding the option to ignore.case is true will match the fields while ignoring the case
# remove whitespace
bad.email$User.Name <- str_trim(bad.email$User.Name)
# returns what im looking for using %in%
subset(bad.email, User.Name %in% c('John Smith'))
filter(bad.email, User.Name %in% 'John Smith')
# returns what im looking for using grepl
subset(bad.email, grepl("John Smith", User.Name, ignore.case = TRUE ))
filter(bad.email, grepl("John Smith", User.Name, ignore.case = TRUE ))
Upvotes: 2