Carsten
Carsten

Reputation: 39

Change text block with nearest search pattern

I want to search and replace a textblock:

I tried with perl -0777 -pe 's/(section:[\s\S]*?"MARKER")/\$1 =~ s/"MARKER"/"NEW MARKER"/gr/e' input.txt (I know, it doesn't work as expected)

Curious about answers :-)

My textfile is like this:

active: true
sections:
- others: "a"
  othersa: "aa"
- section:
  - addPermissions: []
    attribute: "attrA"
    permission: "permA"
    a: "this is some text"
    value: "MARKER"
- section:
  - addPermissions: []
    a: "this is some text"
    value: "M7"
- section:
  - addPermissions: []
    attribute: "attrB"
    value: "MARKER"
- section:
  - addPermissions: []
    a: "this is some text"
    value: "M8"
- section:
  - addPermissions: []
    a: "this is some text"
    value: "M9"
- section:
  - addPermissions: []
    permission: "permC"
    d: "this is some text"
    value: "MARKER"

I want to search a block, where

The found block shall be copie&pasted behind itself having "MARKER" replaced by "NEW MARKER".

The result shall be like:

active: true
sections:
- others: "a"
  othersa: "aa"
- section:
  - addPermissions: []
    attribute: "attrA"
    permission: "permA"
    a: "this is some text"
    value: "MARKER"
- section:
  - addPermissions: []
    attribute: "attrA"
    permission: "permA"
    a: "this is some text"
    value: "NEW MARKER"
- section:
  - addPermissions: []
    a: "this is some text"
    value: "M7"
- section:
  - addPermissions: []
    attribute: "attrB"
    value: "MARKER"
- section:
  - addPermissions: []
    attribute: "attrB"
    value: "NEW MARKER"
- section:
  - addPermissions: []
    a: "this is some text"
    value: "M8"
- section:
  - addPermissions: []
    a: "this is some text"
    value: "M9"
- section:
  - addPermissions: []
    permission: "permC"
    d: "this is some text"
    value: "MARKER"
- section:
  - addPermissions: []
    permission: "permC"
    d: "this is some text"
    value: "NEW MARKER"

Upvotes: 1

Views: 160

Answers (4)

user3408541
user3408541

Reputation: 71

Interesting question! I worked it out like expected, but the [ in the - addPermissions: [] line was giving me problems. Like it wasnt being escaped properly or something. I tried using \Q ... \E blocks, and I tried using quotemeta. They worked kinda but I didnt want to leave the escapes in the output.

The solution I came up with was to manually escape the [, do the search and replace, then manually put it back to the way it was. Kinda strange step there but I couldnt find a way around it.

Here is the code...

#!/usr/bin/perl -w

undef $/;        #grab entire file at once because there are newlines
while(<>){        #loop only once per file
  my $i = 1;
  $entireFile = $_; #stored a copy of the string, this is necessary so the pos($_) value works
                    #properly with the /g flag.  Otherwise pos($_) would reset, and the next
                    #while loop would turn into an infinite loop or some error.
                    #Not an obvious bug there.
  while(/(- section:[\w\W]+?value: )("\w+")/g){
    my ($section, $value)  = ($1,$2);
    if($value eq '"MARKER"'){
      #DEBUG: print "Match $i found: \n\"${section}${value}\"\n";
      $section=~s/\[/\Q[\E/g; #WORKAROUND: strange error that the [ was not escaped properly, this is a bit of a workaround
      $entireFile=~s/${section}${value}/${section}${value}\n${section}\"NEW MARKER\"/; #perform search and replace
                                                                                       #on COPY not original, otherwise
                                                                                       #this would reset pos on the original
                                                                                       #and cause an infinite loop or some error
    }
    $i++;
  }
  $entireFile=~s/\Q\[\E/\[/g; #Now undo the workaround, leaving it the way it was originally
  print "$entireFile";         #Print copy with replacement, original string is preserved
}

Output looks like this...

$ perl new.marker.pl new.marker.txt
active: true
sections:
- others: "a"
  othersa: "aa"
- section:
  - addPermissions: []
    attribute: "attrA"
    permission: "permA"
    a: "this is some text"
    value: "MARKER"
- section:
  - addPermissions: []
    attribute: "attrA"
    permission: "permA"
    a: "this is some text"
    value: "NEW MARKER"
- section:
  - addPermissions: []
    a: "this is some text"
    value: "M7"
- section:
  - addPermissions: []
    attribute: "attrB"
    value: "MARKER"
- section:
  - addPermissions: []
    attribute: "attrB"
    value: "NEW MARKER"
- section:
  - addPermissions: []
    a: "this is some text"
    value: "M8"
- section:
  - addPermissions: []
    a: "this is some text"
    value: "M9"
- section:
  - addPermissions: []
    permission: "permC"
    d: "this is some text"
    value: "MARKER"
- section:
  - addPermissions: []
    permission: "permC"
    d: "this is some text"
    value: "NEW MARKER"

Upvotes: -1

ikegami
ikegami

Reputation: 386541

This matches the whole section by grabbing all the consecutive subsequent lines that are further indented than the first:

^ (\h+) - \h+ section: .* \n (?: \1 \h .* \n )*

Since we're getting the whole section, the marker doesn't need to be the last line of the section.


Putting that to use, we could do a search and replace in /e as you did.

s{
   ^ (\h+) - \h+ section: .* \n (?: \1 \h .* \n )*
}{
   my $section = $&;
   if ( $section =~ /\bMARKER\b/ ) {
      $section . ( $section =~ s/\bMARKER\b/NEWMARKER/gr )
   } else {
      $section
   }
}xmeg;

But that's a lot of extra matches which can be avoided as follows:

s{
   ( ^ (\h+) - \h+ section: .* \n
   (?: \2 \h .* \n )*
   \2 \h .* \b ) MARKER ( \b .* \n
   (?: \2 \h .* \n )* )
}{
   $& . $1 . "NEW MARKER" . $3
}xmeg;

Note that you can use perl -gpe'...' instead of perl -0777pe'...' since 5.36. For example,

perl -gpe'
   s{
      ( ^ (\h+) - \h+ section: .* \n
      (?: \2 \h .* \n )*
      \2 \h .* \b ) MARKER ( \b .* \n
      (?: \2 \h .* \n )* )
   }{
      $& . $1 . "NEW MARKER" . $3
   }xmeg
'

Upvotes: 1

Carsten
Carsten

Reputation: 39

Thank you very much for providing examples and explanations! Finally I will use it like this (wrong search result):

perl -0777 -pe 's/(- section:[\s\S]*?)"MARKER"/$1"MARKER"\n$1"NEW MARKER 2"\n$1"NEW MARKER 3"/g' input.txt

My goal was to find the first block and duplicate it with different value: "MARKER".

Unfortunately, I had to adjust test data. Failure: block used as "$1" is much larger then expected. Block with value: "m7" has been duplicated, what was not intended.

Final result:

Some content ...
- section:
this is some text
this is some more text
value: "MARKER"
- section:
this is some text
this is some more text
value: "NEW MARKER 2"
- section:
this is some text
this is some more text
value: "NEW MARKER 3"
- section:
this is some text
this is some more text
value: "M7"
- section:
this is some text
this is some more text
value: "MARKER"
- section:
this is some text
this is some more text
value: "M7"
- section:
this is some text
this is some more text
value: "NEW MARKER 2"
- section:
this is some text
this is some more text
value: "M7"
- section:
this is some text
this is some more text
value: "NEW MARKER 3"
... content goes on

Upvotes: 1

jhnc
jhnc

Reputation: 16819

Your code is:

perl -0777 -pe 's/(section:[\s\S]*?"MARKER")/\$1 =~ s/"MARKER"/"NEW MARKER"/gr/e' input.txt

This appears to be attempting a nested substitution.

With /e flag, RHS is actual code, so $ should not be escaped:

perl -0777 -pe 's/(section:[\s\S]*?"MARKER")/$1 =~ s/"MARKER"/"NEW MARKER"/gr/e' input.txt

Conversely, delimiters for nested s/// must be escaped (or different):

perl -0777 -pe 's/(section:[\s\S]*?"MARKER")/$1 =~ s\/"MARKER"\/"NEW MARKER"\/gr/e' input.txt

However, this replaces "section...MARKER" by just "NEW MARKER", which is not what seems to be wanted.


Instead, it is simpler to make use of the capture group directly:

perl -0777 -pe 's/(section:[\s\S]*?)"MARKER"/$1"NEW MARKER"/g' input.txt

or use a lookaround:

perl -0777 -pe 's/section:[\s\S]*?\K"MARKER"/"NEW MARKER"/g' input.txt

Upvotes: 2

Related Questions