kabell
kabell

Reputation: 41

Regex matching word that is in text 2 times

I need match a word in English text that appears 2 times in the text. I tried

(^|\ )([^\ ][^\b]*\b).*\ \2\b

but this doesn't match all lines.

Upvotes: 2

Views: 110

Answers (1)

Tim Pietzcker
Tim Pietzcker

Reputation: 336078

There are a few problems with your regex. For example, \b word boundaries cannot be used in a character class, so [^\b]* will not work as intended.

You probably want something like

(?s)\b(\w+)\b.*\b\1\b

This will match the entire text from the first occurrence of the word to the last. This might not be what you actually intended.

Another idea:

(?s)\b(\w+)\b.*?\b\1\b

This will match only the text from the first occurrence of the word to the next.

The problem with both these approaches is that for example in a text like

foo bar bar foo

the regex will match from foo to foo, blindly ignoring that there is a duplicate bar in-between.

So if you actually want to find all words that occur in duplicate, then use

(?s)\b(\w+)\b(?=.*?\b\1\b)

Explanation:

(?s)       # Allow the dot to match newlines
\b(\w+)\b  # Match an entire word
(?=        # Assert that the following regex can be matched from here:
 .*?       #  Any number of characters
 \b\1\b    #  followed by the word that was previously captured
)          # End of lookahead

Upvotes: 3

Related Questions