IanWatson
IanWatson

Reputation: 1739

Backreference to entire nested regex

I have a regex of ((.*)\n)*?stopcondition

The aim of this regex is to match a number of lines until the stop conditions and then to replace the stopcondition.

For example

a
b
stop condition

becomes

a
b
changed condition

Another example:

a
b
c
d
stop condition

becomes

a
b
c
d
changed condition

The issue I'm having is using a nested back reference to all the lines captured before the stop condition.

I've currently resorted to writing 2 regex's to handle the case of 2 lines before and 4 lines before.

Is there some syntactic sugar I'm missing to get a reference to the entire match?

If I use a standard $ back reference in this situation it will just match the end last line found before the stop condition.

Upvotes: 2

Views: 201

Answers (2)

user557597
user557597

Reputation:

Imagine what this says ((.*)\n)*?stopcondition.

This will match anything before stopcondition no matter what it is !!

So, ((.*)\n)*? is totally useless since engines always match the first
available (left to right in the source) regex specified literal.

Even if it were to contain something that needs to be before stopcondition
it is just being replaced without any modification.

In that case, since you're using Perl, use the \K construct.
(Note- some other engines that use PCRE or it's style have this construct
along with it's brother's (*SKIP)(*FAIL)
)

Definition:
\K Keep the stuff left of the \K, don't include it in $&

The stuff is consumed, but not part of the match.
This insures you're matching the right stopcondition but doesn't include
the matched stuff before it.

Find: ((.*)\n)*?\Kstopcondition
Replace: changedcondition

Analyze this ((.*)\n)*? now.

 (                             # (1 start)
      ( .* )                        # (2)
      \n
 )*?                           # (1 end)

Group 1 is overwritten on each quantified pass of ()*?
so, you only ever see what matched on the the very last pass.

In this however,

 (                             # (1 start)
      (                             # (2 start)
           ( .* )                        # (3)
           \n 
      )*?                           # (2 end)
 )                             # (1 end)

Group 1 is not quantified, and contains the entirety of the accumulation of groups
2 and 3 which are overwritten.

P.S. Get some software that knows how to format, analyze, test and benchmark regex.
regexformat.com

Upvotes: 1

c0d3rman
c0d3rman

Reputation: 667

How about this:

^((?:.|\n)*\n)stop condition

(replace with: $1changed condition) This looks for the beginning of a line, followed by any number of characters or newlines, and then a newline and a stop condition. The inner group is a non-capturing group ((?:stuff)) because we only care about capturing the whole chunk of stuff that came before.

If you don't care about starting at the beginning of a line and the stop condition being on its own line you can use the slightly simpler

((?:.|\n)*)stop condition

Although if stop condition is a unique string that appears nowhere else in the file, you could just do a straight search and replace for stop condition and changed condition.

Upvotes: 2

Related Questions