Reputation: 464
I'm trying to use a PCRE regex to match the following list of words:
Out of the following strings:
milk, goatmilk, goat milk, cow milk, watch out for ( milk, eggs), egg, cornstarch
milk. goatmilk. goat milk. cow milk. watch out for ( milk, eggs). egg. cornstarch
milk goatmilk goat milk cow milk watch out for ( milk, eggs). egg cornstarch
This would be an easy excersise but sadly it cannot match any of these words:
In the above case the string should match because of the words:
But if the string does not contain any of those words is should not match, i.e.:
sugar, wheat, goatmilk, goat milk, cornstarch
I've tried to apply these but without any succces:
The closest regex I got from the resources above was:
\b(?!(?:goatmilk|goat\smilk))(egg|milk)\b
This will still match all the words milk and worse it will skip the word eggs because of the word boundries. If I remove the word boundry it will also match goatmilk..
I already thought of the possibility to use two regular expressions, one to match all words and the other to check the matched words for excluded words. However; this would work perfectly if not for the space between goat and milk as the goat part would not be in the match.
If there is no option to do this I'll use PHP to explode on space, walk through the array and if a match has been found a previous index value will be checked to see if the combination contains a word to exclude to mitigate the space issue. However; I would rather not use it as I believe this option is quite ugly :(
Upvotes: 2
Views: 499
Reputation: 626699
If you have to just avoid returning milk
that is part of goatmilk
or goat milk
, you can use (*SKIP)(*FAIL)
regex:
\bgoat\s*milk\b(*SKIP)(*FAIL)|\b(?:eggs?|milk)\b
See the regex demo
The \bgoat\s*milk\b(*SKIP)(*FAIL)
branch will match goatmilk
or goat milk
and will discard the match due to these 2 PCRE verbs. \b(?:eggs?|milk)\b
branch will return the other egg
, eggs
and milk
matches as whole words.
Upvotes: 1