dancoup
dancoup

Reputation: 25

detect strings that both includes and excludes certain word (with stringr package)

Newbie here and couldn't find an answer to my question. I have string observations in my string variable and try to detect MS OR MA OR Master but exclude MBA:

input <- c("Master of Business Administration (MBA) program", "MS, MA, Master", "Master")

desired output with str_detect:

False, True, True

Edit: this worked for me now:

str_detect(input, "\\bMS\\b|\\bMaster\\b|\\bMA\\b") & !str_detect(input,"\\bMBA\\b")

Upvotes: 1

Views: 1627

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626845

You may use a single PCRE pattern (you need to use grepl with perl=TRUE):

> grepl('^(?!.*\\bMBA\\b).*\\b(?:Master|MA)\\b', input, perl=TRUE)
[1] FALSE  TRUE  TRUE

See the regex demo. NOTE that you may use the same pattern with str_detect:

> str_detect(input, '^(?!.*\\bMBA\\b).*\\b(?:Master|MA)\\b')
[1] FALSE  TRUE  TRUE

Details

  • ^ - start of string
  • (?!.*\\bMBA\\b) - a negative lookahead that fails the match if there is a whole word MBA after any 0+ chars other than line break chars from the start of the string (add (?s) at the pattern start to enable multiple line input)
  • .* - any 0+ chars other than line break chars, as many as possible
  • \\b(?:Master|MA)\\b - a whole word Master or MA.

Upvotes: 3

ozanstats
ozanstats

Reputation: 2864

You can combine your logical conditions:

library(stringr)

input <- c("Master of Business Administration (MBA) program", "MS, MA, Master", "Master")

(str_detect(input, "Master") & !str_detect(input, "MBA"))
# [1] FALSE  TRUE  TRUE 

Upvotes: 1

Related Questions