Reputation: 117
I'm working on matching strings from a write-in survey field which asks about a major field of study.
For the "Arts" section I want to match all types of Arts degrees, but specifically exclude liberal arts, culinary arts and language arts.
This code correctly excludes "liberal arts", but I have been unable to list multiple things to exclude.
I am using the grepl function in R with perl-compatible regexps.
field_1_1_arts <-c("\\b(dance|ballet|design|film|(?<!liberal )arts?|music|photograph(ic|y)|theat(er|re)|performing|visual)\\b")
data$field_1_1_arts <- grepl(field_1_1_arts,data$major_fields,ignore.case=TRUE, perl=TRUE)
I have tried:
#this allows both liberal and culinary to pass
field_1_1_arts <-c("\\b(dance|ballet|design|film|(?<!liberal )arts?|(?<!culinary )arts?|music|photograph(ic|y)|theat(er|re)|performing|visual)\\b")
#this gives an invalid expression error
field_1_1_arts <-c("\\b(dance|ballet|design|film|(?<!(liberal|culinary) )arts?|music|photograph(ic|y)|theat(er|re)|performing|visual)\\b")
Upvotes: 1
Views: 147
Reputation: 627469
To exclude more than libral
, just add more negative lookbehinds, e.g.:
"\\b(dance|ballet|design|film|(?<!liberal )(?<!culinary )(?<!language )arts?|music|photograph(ic|y)|theat(er|re)|performing|visual)\\b"
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
See the regex demo
You can also use |
:
"\\b(dance|ballet|design|film|(?<!liberal |culinary |language )arts?|music|photograph(ic|y)|theat(er|re)|performing|visual)\\b"
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
See another demo
Upvotes: 2