vesii
vesii

Reputation: 3138

Regex to match a line in a multi-lined string in Perl

I have the following code:

use Capture::Tiny qw(capture);
my $cmd = $SOME_CMD;
my ($stdout, $stderr, $exit_status) = capture { system($cmd); };
unless ($exit_status && $stdout =~ /^Repository:\s+(.*)/) {
    my $name = $1;
}

It run the $cmd and tries to parse the output. The output looks like:

Information for package perl-base:

Repository: @System
Name: perl-base
Version: 5.10.0-64.81.13.1

For some reason $name is empty probably because it could not group due to multi-lined string. I also tried /^Repository:\s+(.*)/s and /^Repository:\s+(.*)$/ but it didn't work as well.

I want the $name to have @System. How can I do it?

Upvotes: 1

Views: 139

Answers (3)

zdim
zdim

Reputation: 66964

First, the logic of that unless statement is broken, as it short-circuts on success:

unless ($exit_status && $stdout =~ /^Repository:\s+(.*)/) { ... }

is just a syntactic "convenience" for

if (not ($exit_status && $stdout =~ /^Repository:\s+(.*)/) ) { ... }

So if the command ran successfully and $exit_status is falsey (0 for success) then the &&-ed condition is false right there, and so it short-circuits since it is already decided. Thus the regex never runs and $1 stays undef.

But it gets worse: if $exit_status were a positive number and the regex matches (quite possible), then the &&-ed condition is true and with not the whole if is false so you don't get its block to run! While there was valid output from the command (since regex matched).

So I'd suggest to disentangle those double-negatives, for something like

if ( $exit_status==0  and  $stdout =~ /.../m ) { ... }  # but see text

Then there must be an elsif ($exit_status) to interrogate further. But a command may return an exit code as it pleases, and some return non-zero merely to communicate specifics even when they ran successfully! So better break that up, to get to see everything, like

if ($exit_status)      { ... }  # interrogate
if ($stdout =~ /.../m) { ... }  # may have still ran fine even with exit>0

The moral here, if I may emphasize, is about dangers of convoluted code, combined logical negatives, meaningful evaluations inside composite conditions, and all that.


Next, as mentioned, the regex attempts to match a pattern in a multiline string while it uses the anchor ^ -- which anchors the pattern to the beginning of the whole string, not to a line within, as clearly intended; so it would not match the shown text.

With the modifier /m added the behavior of the anchor ^ is changed so to match the beginning of lines within a string.


If this gets one's head spinning consider the equivalent

if ( (not $exit_status)  or  (not $stdout =~ /^Repository:\s+(.*)/) ) { ... 

With falsey $exit_status the first (not $exit_status) is true so the whole if is true right there and the second expression need not be evaluated and so it isn't (in Perl)

Try it with a one-liner

perl -wE'if ( 0 and do { say "hi" } ) { say "bye" }'

This doesn't print anything; no hi nor bye. With 0 the whole condition is certainly false so the do block isn't evaluated, and the if's block isn't either.

If we change and to or though (or 0 to 1), then the first condition (0) doesn't decide yet and the second condition is evaluated, so hi is printed. That condition is true (printing statements normally return 1) and so bye prints, too.

Upvotes: 3

TLP
TLP

Reputation: 67910

$name is empty because it is declared inside a block, which means it is out of scope outside that block. You would know this if you had used use strict, which does not allow you to access undeclared variables.

What you need to do is to declare the variable outside the block:

my $name;   # declared outside block
unless ($exit_status && $stdout =~ /^Repository:\s+(.*)/m) {
    $name = $1;
}
print "Name is: $name\n";   # accessible outside the block

Also, you need to remove the beginning of line anchor ^, or add the /m modifier.

Upvotes: 3

ggorlen
ggorlen

Reputation: 57289

I believe you want the multiline m flag:

use strict;
use warnings;

my $s = 'Information for package perl-base:

Repository: @System
Name: perl-base
Version: 5.10.0-64.81.13.1';

$s =~ /^Repository:\s+(.*)/m;
print $1; # => @System

You can make your regex more accurate with $ to anchor the end of line and + instead of \s+: /^Repository: +(.*)$/m.

Upvotes: 4

Related Questions