user1191027
user1191027

Reputation:

Regex Lookahead & Lookbehind Matching

I need to take a string and extract every instance of a pattern and only the pattern.

String test = "This is a test string to experiment with regex by separating every instance of the word test and words that trail test";

So now the pattern would have to find the word test as well as any words ahead and behind it that are not test. So basically it would have to result in 3 instances of this pattern being found.

The 3 results that I'm expecting are as follows:

  1. This is a test string to experiment with regex by separating every instance of the word
  2. test and words that trail
  3. test

I've played around with postive lookahead and negative lookahead on gskinner but no luck yet.

Upvotes: 2

Views: 157

Answers (2)

Olaf Dietsche
Olaf Dietsche

Reputation: 74028

To follow up my comment, I could imagine splitting your test string with the pattern \btest\b and then join the string parts left and right

String parts[] = test.split("\btest\b", -1);
for (int i = 0; i < parts.length - 1; ++i)
    System.out.println(parts[i] + "test" + parts[i + 1]);

Upvotes: 0

stema
stema

Reputation: 92986

Try this

(\s*\b(?!test\b)[a-z]+\b\s*)*test(\s*\b(?!test\b)[a-z]+\b\s*?)*

See it here on Regexr.

In Java, I would replace [a-z] with \p{L}, but regexr does not support Unicode properties. \p{L} is a Unicode code point with the property letter, this will match every letter in any language.

Explanation:

(\s*\b(?!test\b)[a-z]+\b\s*)* is matching a series of words that are not "test". This is ensured by the negative lookahead assertion (?!test\b).

test is matching "test"

and at the end the same again: match a series of words that are not "test" with again (\s*\b(?!test\b)[a-z]+\b\s*?)*

Upvotes: 3

Related Questions