J.Sabree
J.Sabree

Reputation: 2546

Why is filter(str_detect() returning the wrong values using R?

I'm trying to match people that meet a certain job code, but there's many abbreviations (e.g., "dr." and "dir" are both director. For some reason, my code yields obviously wrong answers (e.g., it retains 'kvp coordinator' in the below example), and I can't figure out what's going on:

library(dplyr)
library(stringr)
test <- tibble(name = c("Corey", "Sibley", "Justin", "Kate", "Ruth", "Phil", "Sara"),
               title = c("kvp coordinator", "manager", "director", "snr dr. of marketing", "drawing expert", "dir of finance", "direct to mail expert"))

test %>%
  filter(str_detect(title, "chief|vp|president|director|dr\\.|dir\\ |dir\\."))

In the above example, only Justin, Kate, and Phil should be left, but somehow the filter doesn't drop Corey.

In addition to an answer, if you could explain why I'm getting this bizarre result, I'd really appreciate it.

Upvotes: 1

Views: 247

Answers (1)

Karthik S
Karthik S

Reputation: 11548

the vp in str_detect pattern matches with kvp, that's why you are getting it in the output.

test %>% filter(str_detect(title, "chief|\\bvp\\b|president|director|dr\\.|dir\\ |dir\\."))
# A tibble: 3 x 2
  name   title               
  <chr>  <chr>               
1 Justin director            
2 Kate   snr dr. of marketing
3 Phil   dir of finance      

Upvotes: 1

Related Questions