Claudio Ferraro
Claudio Ferraro

Reputation: 4721

Unmatch specific word between some symbols

I'm matching all the words with 2-5 letters which are either surrounded by a space, comma or - with this regular expression

(([A-Za-z]{2,5}(?=[ \.-]))|((?<=[ \.-])[A-Za-z]{2,5}))

For example with this input

9-13 and 14-18
9-13 and.14-18
9-13 and-14-18

the word and will be always matched. What I'm unable to achieve is to specify a list o words that should be not matched whatever the preceding and the next chars are.

For example I would like to specify that the word und and the word ind should not be matched no matter what is the previous or next symbols.

Upvotes: 0

Views: 41

Answers (1)

The fourth bird
The fourth bird

Reputation: 163362

You could use

\b(?![ui]nd\b)(?:[A-Za-z]{2,5}(?=[ .-])|(?<=[ .-])[A-Za-z]{2,5}\b)

The pattern matches:

  • \b A word boundary to prevent a partial match
  • (?![ui]nd\b) Negative lookahead, assert not ind or und directly to the right
  • (?: Non capture group, match either
    • [A-Za-z]{2,5}(?=[ .-]) Match 2-5 chars A-Za-z and assert either . or - to the right
    • | Or
    • (?<=[ .-])[A-Za-z]{2,5}\b Positive lookbehind, assert either . or - to the left and match 2-5 chars A-Za-z followed by a word boudnary
  • ) Close non capture group

Note that you don't have to escape the dot in the character class.

enter image description here

Upvotes: 1

Related Questions