Gabriel Bauman
Gabriel Bauman

Reputation: 2416

Regular expression to select words not starting with one of a set of prefixes

I am trying to figure out a regular expression that selects all words that do NOT begin with one of a set of prefixes.

For example, with allowable word prefixes jan|feb|mar|apr I'd want to match the text in bold in the following string:

"in january or feb I marched off to see april"

I managed to select the exact opposite of what I'd like, matching words beginning with the prefixes:

(?:jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)(?:\w+)?

I also managed to select all words that were not the prefixes themselves, but this doesn't handle all words beginning with the prefixes, just words that are the prefix:

[a-z]+\b(?<!\bjan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)

The ultimate goal is to strip all words that do not begin with one of the prefixes from the input string.

Upvotes: 1

Views: 658

Answers (1)

anubhava
anubhava

Reputation: 785186

The ultimate goal is to strip all words that do not begin with one of the prefixes from the input string.

You may use this regex for matching:

\b(?!(?:jan|feb|mar|apr|may|ju[nl]|aug|sep|oct|nov|dec))\w+\s*

and replace it with an empty string.

RegEx Demo

RegEx Details:

  • \b: Word boundary
  • (?!: Start negative lookahead
    • (?:jan|feb|mar|apr|may|ju[nl]|aug|sep|oct|nov|dec): Match of the 3 letter month prefix
  • ): End negative lookahead
  • \w+: Match 1+ word characters
  • \s*: Match 0 or more whitespaces

Upvotes: 4

Related Questions