Reputation: 1020
I'm trying to match if a word such as *FOO
(* as a normal character) is in a line. My input is a C++ source code. I need to use a pre-compiled regex for this due to program flow requirements, so I tried the following:
$pattern = qr/[^a-zA-Z](\*FOO)[^a-zA-Z]|^\s*(\*FOO)[^a-zA-Z]/;
And I use it like this:
if ($line =~ m/$pattern/) { ... }
It works and catches lines containing *FOO
such as hey *FOO.BAR
but also matches lines such as:
//FOO programming using stuff and things
which I want to ignore. What am I missing? Is \*
not the right way to escape *
in a pre-compiled regex in perl? If *FOO
is stored in $word
and the pattern looks like this:
$pattern = qr/[^a-zA-Z](\\$word)[^a-zA-Z]|^\s*(\\$word)[^a-zA-Z]/;
Is that different from the previous pattern? Because I tried both and the result seems to be the same.
I found a way to bypass this problem by removing the first char of $word
and escaping *
in the pattern, but if $word = "**.?FOO"
for example, how do I create a qr//
with $word
so that all the meta-characters are escaped?
Upvotes: 1
Views: 207
Reputation: 66883
You do need to escape the *
. One way to do it is by the quotemeta \Q
operator:
use warnings;
use strict;
my $qr = qr/\Q*FOO/;
while (<DATA>) { print if /$qr/ }
__DATA__
//FOO programming using stuff and things
hey *FOO.BAR
Note that this escapes all ASCII non-"word" characters through the rest of the pattern. If you need to limit its action to only a part of the pattern then stop it using \E
. Please see linked docs.
The above determines whether *FOO
is in the line, regardless of whether it is a word or a part of one. It is not clear to me which is needed. Once that is specified the pattern can be adjusted.
Note that /\*FOO/
works, too. What you tried failed probably because of all the rest that you are trying to match, which purpose I do not understand. If you only need to detect whether the pattern is present the above does it. if there is a more specific requirement please clarify.
As for the examples: for me that string //FOO...
is not matched by the main (first) $pattern
you show. The second one won't interpolate $word
-- but is firstly much too convoluted. The regex can really tie one in nasty knots when pushed; I suggest to keep it simple as much as possible.
Upvotes: 1
Reputation: 253
my $word = '*FOO';
my $pattern = qr/\\$word/;
is equivalent to
my $pattern = qr/\\*FOO/; # zero or more '\' followed by 'FOO'
The $word
is simply interpolated as is.
To get something equivalent to
my $pattern = qr/\*FOO/;
you should use
my $word = '*FOO';
my $pattern = qr/\Q$word\E/;
By default, an interpolated variable is considered a mini-regular expression, meta characters in the variable such as *
, +
, ?
are still interpreted as meta character. \Q...\E
will add a backslash before any character not matching /[A-Za-z_0-9]/
, thus any meta characters in the interpolated variable is interpreted as literal ones. Refer to perldoc.
I tried
my $pattern = qr/[^a-zA-Z](\*FOO)[^a-zA-Z]|^\s*(\*FOO)[^a-zA-Z]/;
my $line = '//FOO programming using stuff and things';
if($line =~ m/$pattern/){
print "$&\n";
}
else{
print "No match!";
}
and it printed "No match!". I can't explain how you get it matched.
Upvotes: 1