Lebowski156
Lebowski156

Reputation: 951

Perl pattern matching "nothing"/empty

This is driving me nuts!

  1. I read a txt file into a string called $filestring.

    sysopen(handle, $filepath, O_RDONLY) or die "WHAT?";
    local $/ = undef;
    my $filestring = <handle>;
    
  2. I made a pattern variable called $regex which is generated dynamically, but takes on the format:

    (a)|(b)|(c)
    
  3. I search the text for patterns separated by a space

    while($filestring =~ m/($regex)\s($regex)/g){
       print "Match: $1 $2\n";
       #...more stuff
    }
    

Most of the matches are valid, but for some reason I get a match like the following every once and a while:

Match: and 

whereas a normal match should have two outputs like the following:

Match: , and

Does anyone know what might be causing this?

EDIT: it appears that the NULL character is being matched in the pattern.

Upvotes: 1

Views: 2300

Answers (1)

Barmar
Barmar

Reputation: 781058

Each of the alternatives in your regexp is a separate capture group. The whole regexp looks like:

((a)|(b)|(c))\s((a)|(b)|(c))
12   3   4     56   7   8

I've notated it with the capture group number for each piece of the regexp.

So if $filestring is b a, $1 will be b, $2 will be the empty strying because nothing matched (a).

To avoid this, you should use non-capturing groups for the alternatives:

((?:a)|(?:b)|(?:c))\s((?:a)|(?:b)|(?:c))

Upvotes: 6

Related Questions