Un Peu
Un Peu

Reputation: 131

Regex to match all words except a given list (2)

I already read the popular (28k views) question of this regex. But it doesn't work for me. Better regex has been found, but I am stuck hardly with one little moment.

Here is the list of drinks:

whisky/gin/nuka-cola/beer/liqueur/abs-inth/tea

and script should get all non soft drinks. I have found nice regex for this:

/\b(?!(?:tea|nuka\-cola)\b)[\w\d\-]+\b/

And result is:

1 : whisky
2 : gin
3 : -cola
4 : beer
5 : liqueur
6 : abs-inth

The problem is with the cola (3rd result). This is because \b doesn't like the '-' character. Please, help me to remove this cola from the list.

Upvotes: 0

Views: 4581

Answers (2)

Francis Gagnon
Francis Gagnon

Reputation: 3675

This regular expression should do the trick:

(?>[\w-]+)(?<!tea|nuka-cola)

Another possibility, if you make sure each keyword starts with a forward slash:

/(?!tea|nuka-cola)([\w-]+)

If you plan on having more that just two drinks that shouldn't appear in your results, the regex can get ugly quickly. In that case I would have a regex (or a simple loop) that matches every word in the list and check if the matched word is present in a HashSet. If a match is found, I would not include the match in the results.

Upvotes: 1

Tim Pietzcker
Tim Pietzcker

Reputation: 336168

\b matches between alphanumeric and non-alphanumeric characters, so it matches before and after the dash in nuka-cola.

Therefore, you can't use \b as a word boundary anchor, but you can define your own. Seeing that your separator is /, simply use (?<=/|^) as the "start-of-word" anchor, and (?=/|$) as the "end-of-word" anchor:

/(?<=\/|^)(?!(?:tea|nuka\-cola)(?=\/|$))[\w\d\-]+(?=\/|$)/

Of course this assumes you're using a regex engine that supports lookbehind assertions. Unfortunately, you didn't specify which language this is for. JavaScript for example doesn't support lookbehinds.

Upvotes: 1

Related Questions