yoniyes
yoniyes

Reputation: 1020

Pre-compiled regex with special characters matching

I'm trying to match if a word such as *FOO (* as a normal character) is in a line. My input is a C++ source code. I need to use a pre-compiled regex for this due to program flow requirements, so I tried the following:

$pattern = qr/[^a-zA-Z](\*FOO)[^a-zA-Z]|^\s*(\*FOO)[^a-zA-Z]/;

And I use it like this:

if ($line =~ m/$pattern/) { ... }

It works and catches lines containing *FOO such as hey *FOO.BAR but also matches lines such as:

//FOO programming using stuff and things

which I want to ignore. What am I missing? Is \* not the right way to escape * in a pre-compiled regex in perl? If *FOO is stored in $word and the pattern looks like this:

$pattern = qr/[^a-zA-Z](\\$word)[^a-zA-Z]|^\s*(\\$word)[^a-zA-Z]/;

Is that different from the previous pattern? Because I tried both and the result seems to be the same.

I found a way to bypass this problem by removing the first char of $word and escaping * in the pattern, but if $word = "**.?FOO" for example, how do I create a qr// with $word so that all the meta-characters are escaped?

Upvotes: 1

Views: 207

Answers (2)

zdim
zdim

Reputation: 66883

You do need to escape the *. One way to do it is by the quotemeta \Q operator:

use warnings;
use strict;

my $qr = qr/\Q*FOO/;

while (<DATA>) { print if /$qr/ }

__DATA__
//FOO programming using stuff and things
hey *FOO.BAR

Note that this escapes all ASCII non-"word" characters through the rest of the pattern. If you need to limit its action to only a part of the pattern then stop it using \E. Please see linked docs.

The above determines whether *FOO is in the line, regardless of whether it is a word or a part of one. It is not clear to me which is needed. Once that is specified the pattern can be adjusted.

Note that /\*FOO/ works, too. What you tried failed probably because of all the rest that you are trying to match, which purpose I do not understand. If you only need to detect whether the pattern is present the above does it. if there is a more specific requirement please clarify.


As for the examples: for me that string //FOO... is not matched by the main (first) $pattern you show. The second one won't interpolate $word -- but is firstly much too convoluted. The regex can really tie one in nasty knots when pushed; I suggest to keep it simple as much as possible.

Upvotes: 1

ltux
ltux

Reputation: 253

Question 1:

my $word = '*FOO';
my $pattern = qr/\\$word/;

is equivalent to

my $pattern = qr/\\*FOO/; # zero or more '\' followed by 'FOO'

The $word is simply interpolated as is.

To get something equivalent to

my $pattern = qr/\*FOO/;

you should use

my $word = '*FOO';
my $pattern = qr/\Q$word\E/;

By default, an interpolated variable is considered a mini-regular expression, meta characters in the variable such as *, +, ? are still interpreted as meta character. \Q...\E will add a backslash before any character not matching /[A-Za-z_0-9]/, thus any meta characters in the interpolated variable is interpreted as literal ones. Refer to perldoc.

Question 2

I tried

my $pattern = qr/[^a-zA-Z](\*FOO)[^a-zA-Z]|^\s*(\*FOO)[^a-zA-Z]/;
my $line = '//FOO programming using stuff and things';

if($line =~ m/$pattern/){
    print "$&\n";
}
else{
    print "No match!";
}

and it printed "No match!". I can't explain how you get it matched.

Upvotes: 1

Related Questions