Reputation: 4721
I'm matching all the words with 2-5 letters which are either surrounded by a space, comma or -
with this regular expression
(([A-Za-z]{2,5}(?=[ \.-]))|((?<=[ \.-])[A-Za-z]{2,5}))
For example with this input
9-13 and 14-18
9-13 and.14-18
9-13 and-14-18
the word and
will be always matched.
What I'm unable to achieve is to specify a list o words that should be not matched whatever the preceding and the next chars are.
For example I would like to specify that the word und
and the word ind
should not be matched no matter what is the previous or next symbols.
Upvotes: 0
Views: 41
Reputation: 163362
You could use
\b(?![ui]nd\b)(?:[A-Za-z]{2,5}(?=[ .-])|(?<=[ .-])[A-Za-z]{2,5}\b)
The pattern matches:
\b
A word boundary to prevent a partial match(?![ui]nd\b)
Negative lookahead, assert not ind
or und
directly to the right(?:
Non capture group, match either
[A-Za-z]{2,5}(?=[ .-])
Match 2-5 chars A-Za-z and assert either
.
or -
to the right|
Or(?<=[ .-])[A-Za-z]{2,5}\b
Positive lookbehind, assert either
.
or -
to the left and match 2-5 chars A-Za-z followed by a word boudnary)
Close non capture groupNote that you don't have to escape the dot in the character class.
Upvotes: 1