wyc
wyc

Reputation: 55273

Why is the following regex only excluding the first letter of words?

I want the regex to exclude some words. Something like this:

(?!\bhe\b|\bit\b)\w+

However, it's only excluding the first letters of these words. In this case, h and i.

enter image description here

Why is this and how to fix it?

https://regexr.com/5gjto

Upvotes: 0

Views: 75

Answers (1)

The fourth bird
The fourth bird

Reputation: 163352

The positive lookahead is not anchored, and will test the assertion before h and e. The first time it is false, but then it will test the assertion again on the position after the h and before the e

Now the assertion is true as there is not he directly to the right at that position, and it will match 1 or more word characters, being the the e

Placing the \b before matching a word char makes sure the lookahead is triggered after first encountering a word boundary.

This way the assertion will not run between h and e because the word boundary will not match.

\b(?!he\b|it\b)\w+

regex demo

Upvotes: 3

Related Questions