NPS
NPS

Reputation: 6355

Regex matches but $1 is uninitialized

My code snippet:

my $URL_PATTERN = qr/http.*html/;
foreach my $urlCandidate(@urlCandidates)
{
    if ($urlCandidate !~ $URL_PATTERN)
    {
        next;
    }
    my $url = $1;
    if ($url !~ $SOME_OTHER_PATTERN)   # line 216
    # ...
}

I get this warning: Use of uninitialized value $url in pattern match (m//) at ./myScript.pl line 216.

What I don't understand is this - if the next instruction isn't executed then I have a match. If I have a match $1 should contain some url string. But instead it's uninitialized. Why's that?

Upvotes: 1

Views: 321

Answers (1)

Sobrique
Sobrique

Reputation: 53508

You're mixing up two things. A 'match' is a boolean test. Does this piece of text match a particular pattern.

if ($urlCandidate !~ $URL_PATTERN)

This only tests whether this variable is (not) like the pattern defined.

$1 is a capture group, and it's used to select things from a pattern. Usually, that's 'stuff in brackets'.

So if you turn your URL pattern into:

qr/(http.*html)/

Then $1 will be defined.

Personally though, I don't like the whole $1 syntax, and tend to assign variables directly out of the pattern.

E.g.:

my ( $capture ) = ( $string =~ m/Content: (\w+)/ );

You can still use this in a boolean expression ( if tests the last expression):

if ( my ( $capture ) = m/pattern_match: (\w+)/ ) {
    print $capture;
}

Or alternatively:

if ( $string =~ m/(?<capture>\w+)/ ) {
    print Dumper \%+;
    print $+{capture},"\n";
}

Alternatively, there's a set of match variables:

$`, $&, $'

$& The string matched by the last successful pattern match (not counting any matches hidden within a BLOCK or eval() enclosed by the current BLOCK).

$` The string preceding whatever was matched by the last successful pattern match, not counting any matches hidden within a BLOCK or eval enclosed by the current BLOCK.

$' The string following whatever was matched by the last successful pattern match (not counting any matches hidden within a BLOCK or eval() enclosed by the current BLOCK).

These each come with a caveat though:

http://perldoc.perl.org/perlvar.html#Performance-issues

Traditionally in Perl, any use of any of the three variables $` , $& or $' (or their use English equivalents) anywhere in the code, caused all subsequent successful pattern matches to make a copy of the matched string, in case the code might subsequently access one of those variables. This imposed a considerable performance penalty across the whole program, so generally the use of these variables has been discouraged.

Upvotes: 4

Related Questions