Reputation: 971
I have the following code.
public class Test {
public static void main(String[] args) {
Pattern pattern = Pattern.compile("Group1 (.*), Group2=(\\[(.*)\\]|null) ,Group3=\\[(.*)\\] ,Group4=\\[(.*)\\]");
String string = "Group1 12345, Group2=null ,Group3=[group3] ,Group4=[group4]";
Matcher matcher = pattern.matcher(string);
matcher.find();
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println(i + ": " +matcher.group(i));
}
System.out.println();
string = "Group1 12345, Group2=[group2] ,Group3=[group3] ,Group4=[group4]";
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println(i + ": " +matcher.group(i));
}
}
}
Output given by the above code:
1: 12345
2: null
3: null
4: group3
5: group4
1: 12345
2: null
3: null
4: group3
5: group4
Question 1: Why am I getting the groupCount as 5? Is it due to multiple regex patterns combined (at Group2)?
Question 2: I expect the output be
12345
null
group3
group4
12345
group2
group3
group4
What should I do to print the output in my expected way.
Please help me understand the program correctly. Thanks
Upvotes: 0
Views: 242
Reputation: 328795
Why 5 groups?
Group1 (.*), Group2=(\\[(.*)\\]|null) ,Group3=\\[(.*)\\] ,Group4=\\[(.*)\\]
^ ^ ^ ^ ^
1 2 3 4 5
Basically, you just need to count the number of opening parentheses.
So that should explain your first output.
As for the second output, your matcher is still pointing to the first string. So you need to include:
string = "Group1 12345, Group2=[group2] ,Group3=[group3] ,Group4=[group4]";
matcher = pattern.matcher(string);
matcher.find();
before the last loop.
Finally, to get the expected output, I would simply use this:
Pattern.compile("Group1 (.*), Group2=\\[?(.*?)\\]? ,Group3=\\[(.*)\\] ,Group4=\\[(.*)\\]");
which is reasonably simple but loses the fact that Group2 needs brackets for non null values. If you want to keep that conditions, you will need to introduce a condition like if (matcher.group(3).isEmpty()) { ... }
.
Pattern explanation for group 2:
\\[? There may be an opening bracket or not, don't capture it
(.*?) Capture what's after "Group2=", excluding the brackets
\\]? There may be a closing bracket or not, don't capture it
Note, the ?
in (.*?)
is a lazy operator and is there to avoid capturing the closing bracket when there is one.
Upvotes: 2
Reputation: 24812
Two capturing groups correspond to your Group2 label :
(\\[(.*)\\]|null)
^---------------^
^--^
You could use a non-capturing group for the inner one :
(\\[(?:.*)\\]|null)
Or in this specific case, since the group seems useless (not used for later reference nor for applying a modifier to a group of token), you should just remove it :
(\\[.*\\]|null)
Upvotes: 1