Reputation: 951
This is driving me nuts!
I read a txt file into a string called $filestring.
sysopen(handle, $filepath, O_RDONLY) or die "WHAT?";
local $/ = undef;
my $filestring = <handle>;
I made a pattern variable called $regex which is generated dynamically, but takes on the format:
(a)|(b)|(c)
I search the text for patterns separated by a space
while($filestring =~ m/($regex)\s($regex)/g){
print "Match: $1 $2\n";
#...more stuff
}
Most of the matches are valid, but for some reason I get a match like the following every once and a while:
Match: and
whereas a normal match should have two outputs like the following:
Match: , and
Does anyone know what might be causing this?
EDIT: it appears that the NULL character is being matched in the pattern.
Upvotes: 1
Views: 2300
Reputation: 781058
Each of the alternatives in your regexp is a separate capture group. The whole regexp looks like:
((a)|(b)|(c))\s((a)|(b)|(c))
12 3 4 56 7 8
I've notated it with the capture group number for each piece of the regexp.
So if $filestring
is b a
, $1
will be b
, $2
will be the empty strying because nothing matched (a)
.
To avoid this, you should use non-capturing groups for the alternatives:
((?:a)|(?:b)|(?:c))\s((?:a)|(?:b)|(?:c))
Upvotes: 6