Basque0407
Basque0407

Reputation: 59

Regex: Capture Everything between two words that does not have a specific string in the middle

Example Strings:

Dandelion The animal dog is blue

The animal cat is blue

Alcohol The animal cow is blue water

I need to use a regex that will capture all instances that starts with the word 'The' and end with the word 'blue', but doesn't have the word 'cat' between these 2 words.

What I tried:

The.*?(?!cat)blue

Desired Result:

2 Matches:

The animal dog is blue

The animal cow is blue

Any help would be appreciated greatly

Upvotes: 1

Views: 1420

Answers (2)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89639

You can play with the character classes \w(word characters) and \W (non-word characters) and the word-boundary \b that matches between them. To forbid words, you only have to test them at a word-boundary using a negative lookahead (?!...) (not followed by ...):

\bThe\W+(?:(?!cat\b|blue\b)\w+\W+)*blue\b

or with a perl compatible regex engine (that supports possessive quantifiers):

\bThe\W++(?:(?!cat\b|blue\b)\w+\W+)*+blue\b

This way, you are sure that cat isn't a part of scat or catering.

Upvotes: 2

Oliver Too Eh
Oliver Too Eh

Reputation: 171

".*" will match everything it can, so the "(?!cat)" portion will continue to match anything after ".*" has already matched "cat"

I would include the condition "not matching anything followed by cat" before matching "anything followed by blue" as follows:

The(?!.*cat).*blue

Upvotes: 0

Related Questions