Neil McGuigan
Neil McGuigan

Reputation: 48287

Matching optional groups

With the following code:

    Pattern pattern = Pattern.compile("((foo) (bar)?)|((foo) (baz)?)");

    Matcher matcher = pattern.matcher("foo baz");

    if(matcher.find()){
        for(int i=1; i<=matcher.groupCount(); i++){
            System.out.println(matcher.group(i));
        }
    } 

The result is:

foo 
foo
null
null
null
null

Whereas I was hoping for:

null
null
null
foo baz
foo
baz

How to match the second item?

I want to match the full "foo baz" if possible, else match "foo"

Upvotes: 0

Views: 65

Answers (2)

Pshemo
Pshemo

Reputation: 124275

This problem is very similar to a|aa where aa will never get a chance to match anything because left site a

  • will be executed first
  • and will be able to match all singe a (even the one in "aa")

You can't change that mechanism of regex and enforce trying to match all regex1|regex2|regex3 cases because

  • it would reduce its performance,
  • and probably more important, what if two cases like regex1 regex2 would match? For instance if regex will be a|aa and we have data like aaaa, from where we should start searching next match, from a:aaa or aa:aa (: represents regex cursor)?

So you could rewrite your regex in a way to make sure that it will be able to match fully each case and place more precise match before more general ones like

(foo bar)|(foo baz)|(foo)

You could also rewrite it as

(foo) (?:(bar)|(baz))?

Upvotes: 2

Jeff Bowman
Jeff Bowman

Reputation: 95764

Your regular expression is behaving as expected: Your input "foo baz" matches ((foo) (bar)?)—at least, the "foo " part does. In general, regular expressions prefer the longest pattern (greedy) defined first (left to right), and "foo " matches that better than "foo baz".

If you want to ensure that the entire expression is matched, you'll need ^ and $:

Pattern pattern = Pattern.compile("^((foo) (bar)?)$|^((foo) (baz)?)$");

Upvotes: 3

Related Questions