How to iterate over a multiline string with perl's regex

Question

I need to extract several sections from a multiline string with Perl. I'm applying the same regex in a while loop. My problem is to get the last section which ends with the file. My workaround is to append the marker. This way the regex will always find and end. Is there a better way to do it?

Example file:

Header

==== /home/src/file1.c#1 ====
content file1
line 1 of file1
line 2 of file1
line 3 of file1

another line of file1

==== /home/src/file2.c#1 ====
content file2
line 1 of file2
line 2 of file2
line 3 of file2

another line of file2

Perl script:

#!/usr/bin/env perl

my $desc = do { local $/ = undef; <> };

$desc .= "
===="; # set the end marker

while($desc =~ /^==== (?.*?)#.*?====$(?.*?)(?=^====)/mgsp) {
  print "filename=", $+{filename}, "
";
  print "content=", $+{content}, "
";
}

This way the script finds both segments. How can I avoid adding the marker?

ikegami · Accepted Answer

Use of the greediness modifier ? is a giant red flag. You can usually get away with using it once in a pattern, but more than that is usually a bug. If you want to match text that doesn't contain a string, use the following instead:

(?:(?!STRING).)*

So that gets you the following:

/
   ^==== [ ] (? [^
]+ ) [ ] ====

   (? (?:(?! ^==== ).)* )
/xsmg

Code:

my $desc = do { local $/;  };

while (
   $desc =~ /
      ^==== [ ] (? [^
]+ ) [ ] ====

      (? (?:(?! ^==== ).)* )
   /xsmg
) {
   print "filename=<<$+{filename}>>
";
   print "content=<<$+{content}>>
";
}

__DATA__
Header

==== /home/src/file1.c#1 ====
content file1
line 1 of file1
line 2 of file1
line 3 of file1

another line of file1

==== /home/src/file2.c#1 ====
content file2
line 1 of file2
line 2 of file2
line 3 of file2

another line of file2

Output:

filename=<>
content=<>
filename=<>
content=<>

How to iterate over a multiline string with perl's regex

Answers (2)

output

Related Questions

How to iterate over a multiline string with perl&#39;s regex

Answers (2)

output

Related Questions

How to iterate over a multiline string with perl's regex