Nobwyn
Nobwyn

Reputation: 8685

How to match nested repeating groups with regex in Java?

I'm trying to match repeating groups with Java:

String s = "The very first line\n"
        + "\n"
        + "AA (aa)\n"
        + "BB (bb)\n"
        + "CC (cc)\n"
        + "\n";

Pattern p = Pattern.compile(
        "The very first line\\s+"
        + "((?<gr1>[a-z]+)\\s+\\((?<gr2>[^)]+)\\)\\s*)+",
        Pattern.DOTALL | Pattern.CASE_INSENSITIVE);

Matcher m = p.matcher(s);

if (m.find()) {
    for (int i = 0; i <= m.groupCount(); i++) {
        System.out.println("group #" + i + ": [" + m.group(i).trim() + "]");
    }
    System.out.println("group gr1: [" + m.group("gr1").trim() + "]");
    System.out.println("group gr2: [" + m.group("gr2").trim() + "]");
}

The problem is with the repeating groups: though the regex matches the whole text block (see group #0 in output example below), when retrieving groups #2 and #3 (or by name as well - gr1/gr2) it does return only the last match (CC/cc) and skips the previous ones (AA/aa and BB/bb)

group #0: [The very first line

AA (aa)
BB (bb)
CC (cc)]
group #1: [CC (cc)]
group #2: [CC]
group #3: [cc]
group gr1: [CC]
group gr2: [cc]

Is there a way to solve this?

edit: The very first line is in the pattern as identification string - see the comment to the gknicker's answer below

Upvotes: 1

Views: 1378

Answers (1)

gknicker
gknicker

Reputation: 5569

It seems like you wanted your pattern to match not the whole input string, but just the individual repeating sections. If that's true, your pattern would be:

    Pattern p = Pattern.compile(
        "((?<gr1>[a-z]+)\\s+\\((?<gr2>[^)]+)\\))",
        Pattern.CASE_INSENSITIVE);

Then in this case you would have a while loop to find each match:

    Matcher m = p.matcher(s);

    while (m.find()) {
        System.out.println("group gr1: ["
            + m.group("gr1").trim() + "]");
        System.out.println("group gr2: ["
            + m.group("gr2").trim() + "]");
    }

But if you need the whole match, you'll probably have to use two patterns like this:

    String s = "The very first line\n"
        + "\n"
        + "AA (aa)\n"
        + "BB (bb)\n"
        + "CC (cc)\n"
        + "\n";

    Pattern p = Pattern.compile(
        "The very first line\\s+(([a-z]+)\\s+\\(([^)]+)\\)\\s*)+",
        Pattern.CASE_INSENSITIVE);

    Pattern p2 = Pattern.compile(
        "((?<gr1>[a-z]+)\\s+\\((?<gr2>[^)]+)\\))",
        Pattern.CASE_INSENSITIVE);

    Matcher m = p.matcher(s);
    while (m.find()) {
        Matcher m2 = p2.matcher(m.group());
        while (m2.find()) {
            System.out.println("group gr1: ["
                + m2.group("gr1").trim() + "]");
            System.out.println("group gr2: ["
                + m2.group("gr2").trim() + "]");
        }
    }

Upvotes: 1

Related Questions