Ravi
Ravi

Reputation: 971

Matcher.group() not returning correct value when more than one pattern is combined

I have the following code.

public class Test {

    public static void main(String[] args) {
        Pattern pattern = Pattern.compile("Group1 (.*), Group2=(\\[(.*)\\]|null) ,Group3=\\[(.*)\\] ,Group4=\\[(.*)\\]");
        String string = "Group1 12345, Group2=null ,Group3=[group3] ,Group4=[group4]";
        Matcher matcher = pattern.matcher(string);
        matcher.find();

        for (int i = 1; i <= matcher.groupCount(); i++) {
            System.out.println(i + ": " +matcher.group(i));
        }
        System.out.println();

        string = "Group1 12345, Group2=[group2] ,Group3=[group3] ,Group4=[group4]";

        for (int i = 1; i <= matcher.groupCount(); i++) {
            System.out.println(i + ": " +matcher.group(i));
        }
    }
}

Output given by the above code:

1: 12345
2: null
3: null
4: group3
5: group4

1: 12345
2: null
3: null
4: group3
5: group4

Question 1: Why am I getting the groupCount as 5? Is it due to multiple regex patterns combined (at Group2)?

Question 2: I expect the output be

12345
null
group3
group4

12345
group2
group3
group4

What should I do to print the output in my expected way.

Please help me understand the program correctly. Thanks

Upvotes: 0

Views: 242

Answers (2)

assylias
assylias

Reputation: 328795

Why 5 groups?

Group1 (.*), Group2=(\\[(.*)\\]|null) ,Group3=\\[(.*)\\] ,Group4=\\[(.*)\\]
       ^            ^   ^                        ^                  ^
       1            2   3                        4                  5

Basically, you just need to count the number of opening parentheses.

So that should explain your first output.

As for the second output, your matcher is still pointing to the first string. So you need to include:

string = "Group1 12345, Group2=[group2] ,Group3=[group3] ,Group4=[group4]";
matcher = pattern.matcher(string);
matcher.find();

before the last loop.

Finally, to get the expected output, I would simply use this:

Pattern.compile("Group1 (.*), Group2=\\[?(.*?)\\]? ,Group3=\\[(.*)\\] ,Group4=\\[(.*)\\]");

which is reasonably simple but loses the fact that Group2 needs brackets for non null values. If you want to keep that conditions, you will need to introduce a condition like if (matcher.group(3).isEmpty()) { ... }.

Pattern explanation for group 2:

\\[?  There may be an opening bracket or not, don't capture it
(.*?) Capture what's after "Group2=", excluding the brackets
\\]? There may be a closing bracket or not, don't capture it

Note, the ? in (.*?) is a lazy operator and is there to avoid capturing the closing bracket when there is one.

Upvotes: 2

Aaron
Aaron

Reputation: 24812

Two capturing groups correspond to your Group2 label :

(\\[(.*)\\]|null)
^---------------^
    ^--^

You could use a non-capturing group for the inner one :

(\\[(?:.*)\\]|null)

Or in this specific case, since the group seems useless (not used for later reference nor for applying a modifier to a group of token), you should just remove it :

(\\[.*\\]|null)

Upvotes: 1

Related Questions