Kyle
Kyle

Reputation: 4376

Why does this regular expression match the second item and not the first

Suppose I have the following regular expression:

/BAR|FOO BAR/gi

And the following input string: "FOO BAR"

I would expect to get a match on "BAR", but I actually get a match on "FOO BAR". Why is this?

Upvotes: 2

Views: 159

Answers (2)

Glitch Desire
Glitch Desire

Reputation: 15043

Regex will look for the pattern matched FIRST

First of all, let's examine your regular expression:

"/BAR|FOO BAR/gi"

What this searches for is either BAR or FOO BAR in the matched string. The flags (assuming regex compliance) are 'global' and 'case insensitive':

  1. Global flag means that the expression will attempt to return all matches in the haystack.
  2. Case insensitive flag means that the expression will match regardless of case.

Let's try a few things in order to understand how matching works (Note: I'm using perl because it's the most popular regex implementation, but these examples should work for your language if it's compliant):

use warnings;
use strict;

my $string = "FOO BAR";

if ($string =~ /FOO/) { print "1. True\n"; }  # 'FOO' matches in string
if ($string =~ /BAR/) { print "2. True\n"; }  # 'BAR' matches in string
if ($string =~ /foo/i) { print "3. True\n"; } # 'foo' matches in string, ignoring case

This will print true for all 3 statements (demo), demonstrating that FOO, BAR and foo are all valid matches with ignore case flag.

So, why is your regex matching 'FOO BAR' instead of 'BAR'?

Because, as documented, the parser will try to match the earliest match in the string.

my $string = "FOO BAR";

$string =~ /(FOO BAR|BAR)/;
print $1; # Prints 'FOO BAR'

Note that setting /g does not cause both to match, because it will try to match the ENTIRE rule /FOO BAR|BAR/ as many times as possible, rather than match each side of the rule. As soon as 'FOO BAR' is matched, it will stop trying to match in that part of the string and move on.

What would you do if you wanted to match both FOO BAR and BAR?

This regular expression would match 'FOO BAR' and 'BAR' given your input string:

my $string = "FOO BAR";

$string =~ /(FOO (BAR))/;
print "$1\n"; # Prints 'FOO BAR'
print $2;     # Prints 'BAR'

Demonstration of the /g flag in context

This, using the /g flag, would match FOO and BAR:

my $string = "FOO BAR";

while($string =~ /(FOO|BAR)/g) {
    print "$1\n";
}

To match what you're looking for...

This example would match FOO followed by space, BAR and FOO BAR for any input string.

my $string = "FOO BAR";

while($string =~ /((FOO\s)?(BAR))/g) {
    print "$1\n$2\n$3";
}

Note: I have removed irrelevant flags from examples so as not to confuse future readers with similar issues.

Upvotes: 6

Niet the Dark Absol
Niet the Dark Absol

Reputation: 324750

Regexes start at the beginning. It sees the F, and tries to match it against the BAR option. This of course fails. It then tries the FOO BAR option, and that seems to work, so it runs with that to find out if it works. Sure enough, it does, and so the match is FOO BAR.

Upvotes: 5

Related Questions