Reputation: 6355
My code snippet:
my $URL_PATTERN = qr/http.*html/;
foreach my $urlCandidate(@urlCandidates)
{
if ($urlCandidate !~ $URL_PATTERN)
{
next;
}
my $url = $1;
if ($url !~ $SOME_OTHER_PATTERN) # line 216
# ...
}
I get this warning:
Use of uninitialized value $url in pattern match (m//) at ./myScript.pl line 216.
What I don't understand is this - if the next
instruction isn't executed then I have a match. If I have a match $1
should contain some url string. But instead it's uninitialized. Why's that?
Upvotes: 1
Views: 321
Reputation: 53508
You're mixing up two things. A 'match' is a boolean test. Does this piece of text match a particular pattern.
if ($urlCandidate !~ $URL_PATTERN)
This only tests whether this variable is (not) like the pattern defined.
$1
is a capture group, and it's used to select things from a pattern. Usually, that's 'stuff in brackets'.
So if you turn your URL pattern into:
qr/(http.*html)/
Then $1
will be defined.
Personally though, I don't like the whole $1
syntax, and tend to assign variables directly out of the pattern.
E.g.:
my ( $capture ) = ( $string =~ m/Content: (\w+)/ );
You can still use this in a boolean expression ( if tests the last expression):
if ( my ( $capture ) = m/pattern_match: (\w+)/ ) {
print $capture;
}
Or alternatively:
if ( $string =~ m/(?<capture>\w+)/ ) {
print Dumper \%+;
print $+{capture},"\n";
}
Alternatively, there's a set of match variables:
$`, $&, $'
$& The string matched by the last successful pattern match (not counting any matches hidden within a BLOCK or eval() enclosed by the current BLOCK).
$` The string preceding whatever was matched by the last successful pattern match, not counting any matches hidden within a BLOCK or eval enclosed by the current BLOCK.
$' The string following whatever was matched by the last successful pattern match (not counting any matches hidden within a BLOCK or eval() enclosed by the current BLOCK).
These each come with a caveat though:
http://perldoc.perl.org/perlvar.html#Performance-issues
Traditionally in Perl, any use of any of the three variables $` , $& or $' (or their use English equivalents) anywhere in the code, caused all subsequent successful pattern matches to make a copy of the matched string, in case the code might subsequently access one of those variables. This imposed a considerable performance penalty across the whole program, so generally the use of these variables has been discouraged.
Upvotes: 4