jenniem001
jenniem001

Reputation: 596

Problem with Regex

I have a perl file that takes in txt files and compares them to other words in another txt file if they match then the file gets moved to another folder

I'm currently getting this error:

Unmatched ( in regex; marked by <-- HERE in m/\b( <-- HERE who\b/ at filter.pl line 45.

My perl file line 45 is:

if ($x =~ m/\b$word\b/) {

I don't know if it has anything to do with the rest of the file but I'll put my code up just incase!!

$dirtoget="/Users/jennie/crimes/";
opendir(IMD, $dirtoget) || die("Cannot open directory");
@thefiles= readdir(IMD);

foreach $f (@thefiles){
    if ($f =~ m/.txt/){
    #print "matches a txt file\n";
#print $f;
        open (FILE, "/Users/jennie/crimes/$f")or die"Cannot open FILE";

        if ( FILE eq "" ) {

            close FILE;
        }
        else{
       # print "In the Else\n";
            while (<FILE>) {
                foreach $word(split) {
                    foreach $x (@triggers) {
                        if ($x =~ m/\b$word\b/) {

                            print $word,"\n";
                                print $f,"\n";

                            copy("/Users/jennie/crimes/$f","/Users/jennie/crimeStories/$f")or die "Copy failed: $!";
                    close FILE;
                    } 
                    }

                }
            }
        }
    }
}
closedir(IMD);
exit 0;

The error doesn't make much sense to me I'm far from a whiz at regular expressions:-(

Upvotes: 2

Views: 1160

Answers (2)

Vivin Paliath
Vivin Paliath

Reputation: 95578

This is probably happening because $word contains a metacharacter. A ( in this case, which denotes the start of a capturing group. What this means is that your regex will be broken as $word can contain metacharacters. You can use \Q and \E to make sure that the contents of $word are "quoted" so that they will not be interpreted as metacharacters:

$x =~ m/\b\Q$word\E\b/

There is more information here.

EDIT

Based on tchrist's comment, the \b wouldn't make sense in this context unless you can ensure that $word contains only alphanumeric characters. But in general, to get around your problem, use:

$x = m/\Q$word\E/

Upvotes: 1

Anon.
Anon.

Reputation: 60013

You're interpolating the contents of $word directly into the regex. This means any metacharacters in $word will be interpreted as metacharacters, potentially breaking your regex.

If you want to match the literal contents of $word, use \Q and \E:

$x =~ m/\b\Q$word\E\b/

Additionally, as @goreSplatter mentioned in the comments, you've got another broken regex further up the page:

$f =~ /.txt/

First of all, . is a metacharacter that matches any character. Second of all, it's not anchored - the regex will succeed if that character sequence appears anywhere in the filename. So for example, it will match "thisisnotatxtfile.bin".

You can use the File::Basename module to extract just the file extension and test it, or you can modify the regex by escaping the . and anchoring it to the end of the filename:

$f =~ /\.txt$/

Upvotes: 3

Related Questions