Reputation: 8685
I'm trying to match repeating groups with Java:
String s = "The very first line\n"
+ "\n"
+ "AA (aa)\n"
+ "BB (bb)\n"
+ "CC (cc)\n"
+ "\n";
Pattern p = Pattern.compile(
"The very first line\\s+"
+ "((?<gr1>[a-z]+)\\s+\\((?<gr2>[^)]+)\\)\\s*)+",
Pattern.DOTALL | Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(s);
if (m.find()) {
for (int i = 0; i <= m.groupCount(); i++) {
System.out.println("group #" + i + ": [" + m.group(i).trim() + "]");
}
System.out.println("group gr1: [" + m.group("gr1").trim() + "]");
System.out.println("group gr2: [" + m.group("gr2").trim() + "]");
}
The problem is with the repeating groups: though the regex matches the whole text block (see group #0
in output example below), when retrieving groups #2
and #3
(or by name as well - gr1
/gr2
) it does return only the last match (CC/cc
) and skips the previous ones (AA/aa
and BB/bb
)
group #0: [The very first line
AA (aa)
BB (bb)
CC (cc)]
group #1: [CC (cc)]
group #2: [CC]
group #3: [cc]
group gr1: [CC]
group gr2: [cc]
Is there a way to solve this?
edit: The very first line
is in the pattern as identification string - see the comment to the gknicker's answer below
Upvotes: 1
Views: 1378
Reputation: 5569
It seems like you wanted your pattern to match not the whole input string, but just the individual repeating sections. If that's true, your pattern would be:
Pattern p = Pattern.compile(
"((?<gr1>[a-z]+)\\s+\\((?<gr2>[^)]+)\\))",
Pattern.CASE_INSENSITIVE);
Then in this case you would have a while
loop to find each match:
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println("group gr1: ["
+ m.group("gr1").trim() + "]");
System.out.println("group gr2: ["
+ m.group("gr2").trim() + "]");
}
But if you need the whole match, you'll probably have to use two patterns like this:
String s = "The very first line\n"
+ "\n"
+ "AA (aa)\n"
+ "BB (bb)\n"
+ "CC (cc)\n"
+ "\n";
Pattern p = Pattern.compile(
"The very first line\\s+(([a-z]+)\\s+\\(([^)]+)\\)\\s*)+",
Pattern.CASE_INSENSITIVE);
Pattern p2 = Pattern.compile(
"((?<gr1>[a-z]+)\\s+\\((?<gr2>[^)]+)\\))",
Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(s);
while (m.find()) {
Matcher m2 = p2.matcher(m.group());
while (m2.find()) {
System.out.println("group gr1: ["
+ m2.group("gr1").trim() + "]");
System.out.println("group gr2: ["
+ m2.group("gr2").trim() + "]");
}
}
Upvotes: 1