andrea
andrea

Reputation: 117

Regex - Multiple Negative Lookbehinds

I'm working on matching strings from a write-in survey field which asks about a major field of study.

For the "Arts" section I want to match all types of Arts degrees, but specifically exclude liberal arts, culinary arts and language arts.

This code correctly excludes "liberal arts", but I have been unable to list multiple things to exclude.

I am using the grepl function in R with perl-compatible regexps.

field_1_1_arts <-c("\\b(dance|ballet|design|film|(?<!liberal )arts?|music|photograph(ic|y)|theat(er|re)|performing|visual)\\b") 

data$field_1_1_arts <- grepl(field_1_1_arts,data$major_fields,ignore.case=TRUE, perl=TRUE)

I have tried:

#this allows both liberal and culinary to pass
field_1_1_arts <-c("\\b(dance|ballet|design|film|(?<!liberal )arts?|(?<!culinary )arts?|music|photograph(ic|y)|theat(er|re)|performing|visual)\\b") 

#this gives an invalid expression error
field_1_1_arts <-c("\\b(dance|ballet|design|film|(?<!(liberal|culinary) )arts?|music|photograph(ic|y)|theat(er|re)|performing|visual)\\b") 

Upvotes: 1

Views: 147

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627469

To exclude more than libral, just add more negative lookbehinds, e.g.:

"\\b(dance|ballet|design|film|(?<!liberal )(?<!culinary )(?<!language )arts?|music|photograph(ic|y)|theat(er|re)|performing|visual)\\b"
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^

See the regex demo

You can also use |:

"\\b(dance|ballet|design|film|(?<!liberal |culinary |language )arts?|music|photograph(ic|y)|theat(er|re)|performing|visual)\\b"
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

See another demo

Upvotes: 2

Related Questions