ghaschel
ghaschel

Reputation: 1355

Regex: Match pattern unless preceded by pattern containing element from the matching character class

I am having a hard time coming up with a regex to match a specific case:

This can be matched: any-dashed-strings this-can-be-matched-even-though-its-big

This cannot be matched: strings starting with elem- or asdf- or a single - elem-this-cannot-be-matched asdf-this-cannot-be-matched -

So far what I came up with is:

/\b(?!elem-|asdf-)([\w\-]+)\b/

But I keep matching a single - and the whole -this-cannot-be-matched suffix. I cannot figure it out how to not only ignore a character present inside the matching character class conditionally, and not matching anything else if a suffix is found

I am currently working with the Oniguruma engine (Ruby 1.9+/PHP multi-byte string module).

If possible, please elaborate on the solution. Thanks a lot!

Upvotes: 0

Views: 32

Answers (1)

The fourth bird
The fourth bird

Reputation: 163362

If a lookbehind is supported, you can assert a whitespace boundary to the left, and make the alternation for both words without the hyphen optional.

(?<!\S)(?!(?:elem|asdf)?-)[\w-]+\b

Explanation

  • (?<!\S) Assert a whitespace boundary to the left
  • (?! Negative lookahead, assert the directly to the right is not
    • (?:elem|asdf)?- Optionally match elem or asdf followed by -
  • ) Close the lookahead
  • [\w-]+ Match 1+ word chars or -
  • \b A word boundary

See a regex demo.

Or a version with a capture group and without a lookbehind:

(?:\s|^)(?!(?:elem|asdf)?-)([\w-]+)\b

See another regex demo.

Upvotes: 1

Related Questions