user3605127
user3605127

Reputation: 23

regex pattern to find word repetition between two specific words

I am able to find the repetition pattern for a given sentence using (.+)(?=\1+). But when I tried the same between two specific words in a sentence it fails with "no match".

Am I missing something here.

Example:

abc def def def def ghi ghi xyz

When I use /abc (.+)(?=\1+) xyz/, it fails with no match.

Don't want to add first word and second word under any conditional paranthesis. So I want the regex statement to be of this type /abc regex expression def/

Upvotes: 0

Views: 623

Answers (1)

Kyle Strand
Kyle Strand

Reputation: 16499

In response to your edit, the pattern you first used isn't working because (1) you're not properly accounting for spaces, (2) you're over-specifying where the matched segment must begin and end, and (3) you're not really using the look-ahead feature correctly. Here's a more in-depth explanation:

  • by specifying that the pattern is between abc and def, you are limiting the substrings that can be matched to def def def, def def, or def.
  • Presumably, you want to match the first of the above three options to be matched. But note that because your second group is a look-ahead, the first group must match the entire substring.
  • The lookahead means that the string immediately after the lookahead atom is matched by the lookahead atom. So your pattern is indicating that def must match \1+. But def starts with a space, so it can't match any of the aforementioned strings (def def def, def def, or def).

What it appears you're actually trying to do is specify that the matched segment of the string should be preceded by the word abc and followed by the word def. In that case, just use a lookbehind and a lookahead:

/(?<=\babc\b).*?(\w+)\W+(\1\b\W*)+.*?(?=\bdef\b)/

I got rid of your original lookahead; the match you want (i.e. the word that's repeated) is in the first capture group (i.e. the variable $1). Note that I'm using \w and \W to distinguish between word and non-word characters in addition to the \b "word boundary" zero-width atom.

EDIT: The lookahead/lookbehind are actually unnecessary. Since you want to use a pattern without them, here's the version you want:

/\babc\b.*?(\w+)\W+(\1\b\W*)+.*?\bdef\b/

Upvotes: 1

Related Questions