Regex Lookahead & Lookbehind Matching

Question

I need to take a string and extract every instance of a pattern and only the pattern.

String test = "This is a test string to experiment with regex by separating every instance of the word test and words that trail test";

So now the pattern would have to find the word test as well as any words ahead and behind it that are not test. So basically it would have to result in 3 instances of this pattern being found.

The 3 results that I'm expecting are as follows:

This is a test string to experiment with regex by separating every instance of the word
test and words that trail
test

I've played around with postive lookahead and negative lookahead on gskinner but no luck yet.

stema · Accepted Answer

Try this

(\s*\b(?!test\b)[a-z]+\b\s*)*test(\s*\b(?!test\b)[a-z]+\b\s*?)*

See it here on Regexr.

In Java, I would replace [a-z] with \p{L}, but regexr does not support Unicode properties. \p{L} is a Unicode code point with the property letter, this will match every letter in any language.

Explanation:

(\s*\b(?!test\b)[a-z]+\b\s*)* is matching a series of words that are not "test". This is ensured by the negative lookahead assertion (?!test\b).

test is matching "test"

and at the end the same again: match a series of words that are not "test" with again (\s*\b(?!test\b)[a-z]+\b\s*?)*

Regex Lookahead & Lookbehind Matching

Answers (2)

Related Questions

Regex Lookahead &amp; Lookbehind Matching

Answers (2)

Related Questions

Regex Lookahead & Lookbehind Matching