Reputation: 41
I need match a word in English text that appears 2 times in the text. I tried
(^|\ )([^\ ][^\b]*\b).*\ \2\b
but this doesn't match all lines.
Upvotes: 2
Views: 110
Reputation: 336078
There are a few problems with your regex. For example, \b
word boundaries cannot be used in a character class, so [^\b]*
will not work as intended.
You probably want something like
(?s)\b(\w+)\b.*\b\1\b
This will match the entire text from the first occurrence of the word to the last. This might not be what you actually intended.
Another idea:
(?s)\b(\w+)\b.*?\b\1\b
This will match only the text from the first occurrence of the word to the next.
The problem with both these approaches is that for example in a text like
foo bar bar foo
the regex will match from foo
to foo
, blindly ignoring that there is a duplicate bar
in-between.
So if you actually want to find all words that occur in duplicate, then use
(?s)\b(\w+)\b(?=.*?\b\1\b)
Explanation:
(?s) # Allow the dot to match newlines
\b(\w+)\b # Match an entire word
(?= # Assert that the following regex can be matched from here:
.*? # Any number of characters
\b\1\b # followed by the word that was previously captured
) # End of lookahead
Upvotes: 3