Regular expressions positive lookbehind + negative lookahead

Question

Given a string "A B C a b B" I want to match words that are repeated (regardless of case). Expected result would be matching "a" and "b" (last occurrences of A and B) OR "A" and "B" (first occurrences)

EDIT: I want to match only the first or the last occurrence of the word

I know this question could be better answered by spliting the string and count each token (lowering that case).
However, I'd like to try and formulate a regex to help me find those words, just for the sake of practice.

My first atempt was: (?=\b(\w+)\b.*\b(\1)\b)(\1)
However it matches the first A, first B and second b (A B b).

I was thinking to somehow use positive look-behind with negative look-ahead to fetch the last instances of the repeating word: (?<=.*(?!.*(\w+).*)\1.*)\b\1\b
(In my head is translates to "a word that had been matched before and won't match again")

Well, it doesn't work for me unfortunately.

Is it possible to use positive look-behind and negative look-ahead this way?
Could my regex be fixed?
I've tried to solve it in C#.

This is not homework

Lucas Trzesniewski · Accepted Answer

Interesting puzzle. Here's my solution:

(\b\w+\b)(?:(?=.*?\b\1\b)|(?<=\b\1\b.*?\1))

Demo

The reasoning is as follows:

Match a word: (\b\w+\b)
Then either: (?:...|...)
- Make sure it occurs again later on: (?=.*?\b\1\b)
- Or it already occurred before: (?<=\b\1\b.*?\1)
  
  That second \1 in the lookbehind matches the word that was just matched before. The first \1 is the real duplicate.

Answer for the edited question:

If you only want to match the first occurrence of a duplicated word, we can change the above pattern a bit:

(\b\w+\b)(?=.*?\b\1\b)(?



Demo

Now the logic is:


Match a word: (\b\w+\b)
Make sure it occurs again: (?=.*?\b\1\b)
And make sure it didn't occur before: (?


(same thing than before except with a negative lookbehind)

Regular expressions positive lookbehind + negative lookahead

Answers (1)

Related Questions