Reputation: 4376
Suppose I have the following regular expression:
/BAR|FOO BAR/gi
And the following input string: "FOO BAR"
I would expect to get a match on "BAR", but I actually get a match on "FOO BAR". Why is this?
Upvotes: 2
Views: 159
Reputation: 15043
First of all, let's examine your regular expression:
"/BAR|FOO BAR/gi"
What this searches for is either BAR
or FOO BAR
in the matched string. The flags (assuming perl regex compliance) are 'global' and 'case insensitive':
Let's try a few things in order to understand how matching works (Note: I'm using perl
because it's the most popular regex implementation, but these examples should work for your language if it's compliant):
use warnings;
use strict;
my $string = "FOO BAR";
if ($string =~ /FOO/) { print "1. True\n"; } # 'FOO' matches in string
if ($string =~ /BAR/) { print "2. True\n"; } # 'BAR' matches in string
if ($string =~ /foo/i) { print "3. True\n"; } # 'foo' matches in string, ignoring case
This will print true
for all 3 statements (demo), demonstrating that FOO
, BAR
and foo
are all valid matches with ignore case flag.
So, why is your regex matching 'FOO BAR
' instead of 'BAR
'?
Because, as documented, the parser will try to match the earliest match in the string.
my $string = "FOO BAR";
$string =~ /(FOO BAR|BAR)/;
print $1; # Prints 'FOO BAR'
Note that setting /g
does not cause both to match, because it will try to match the ENTIRE rule /FOO BAR|BAR/
as many times as possible, rather than match each side of the rule. As soon as 'FOO BAR
' is matched, it will stop trying to match in that part of the string and move on.
What would you do if you wanted to match both FOO BAR
and BAR
?
This regular expression would match 'FOO BAR
' and 'BAR
' given your input string:
my $string = "FOO BAR";
$string =~ /(FOO (BAR))/;
print "$1\n"; # Prints 'FOO BAR'
print $2; # Prints 'BAR'
Demonstration of the /g
flag in context
This, using the /g
flag, would match FOO
and BAR
:
my $string = "FOO BAR";
while($string =~ /(FOO|BAR)/g) {
print "$1\n";
}
This example would match FOO
followed by space, BAR
and FOO BAR
for any input string.
my $string = "FOO BAR";
while($string =~ /((FOO\s)?(BAR))/g) {
print "$1\n$2\n$3";
}
Note: I have removed irrelevant flags from examples so as not to confuse future readers with similar issues.
Upvotes: 6
Reputation: 324750
Regexes start at the beginning. It sees the F
, and tries to match it against the BAR
option. This of course fails. It then tries the FOO BAR
option, and that seems to work, so it runs with that to find out if it works. Sure enough, it does, and so the match is FOO BAR
.
Upvotes: 5