Eugene Krapivin
Eugene Krapivin

Reputation: 1861

Regular expressions positive lookbehind + negative lookahead

Given a string "A B C a b B" I want to match words that are repeated (regardless of case). Expected result would be matching "a" and "b" (last occurrences of A and B) OR "A" and "B" (first occurrences)

EDIT: I want to match only the first or the last occurrence of the word

I know this question could be better answered by spliting the string and count each token (lowering that case).
However, I'd like to try and formulate a regex to help me find those words, just for the sake of practice.

My first atempt was: (?=\b(\w+)\b.*\b(\1)\b)(\1)
However it matches the first A, first B and second b (A B b).

I was thinking to somehow use positive look-behind with negative look-ahead to fetch the last instances of the repeating word: (?<=.*(?!.*(\w+).*)\1.*)\b\1\b
(In my head is translates to "a word that had been matched before and won't match again")

Well, it doesn't work for me unfortunately.

Is it possible to use positive look-behind and negative look-ahead this way?
Could my regex be fixed?
I've tried to solve it in C#.

This is not homework

Upvotes: 4

Views: 1107

Answers (1)

Lucas Trzesniewski
Lucas Trzesniewski

Reputation: 51330

Interesting puzzle. Here's my solution:

(\b\w+\b)(?:(?=.*?\b\1\b)|(?<=\b\1\b.*?\1))

Demo

The reasoning is as follows:

  • Match a word: (\b\w+\b)

  • Then either: (?:...|...)

    • Make sure it occurs again later on: (?=.*?\b\1\b)
    • Or it already occurred before: (?<=\b\1\b.*?\1)

      That second \1 in the lookbehind matches the word that was just matched before. The first \1 is the real duplicate.


Answer for the edited question:

If you only want to match the first occurrence of a duplicated word, we can change the above pattern a bit:

(\b\w+\b)(?=.*?\b\1\b)(?<!\b\1\b.*?\1)

Demo

Now the logic is:

  • Match a word: (\b\w+\b)
  • Make sure it occurs again: (?=.*?\b\1\b)
  • And make sure it didn't occur before: (?<!\b\1\b.*?\1)

    (same thing than before except with a negative lookbehind)

Upvotes: 2

Related Questions