Chad Chaddington
Chad Chaddington

Reputation: 21

How To Match Repeating Sub-Patterns

Let's say I have a string:

String sentence = "My nieces are Cara:8 Sarah:9 Tara:10";

And I would like to find all their respective names and ages with the following pattern matcher:

String regex = "My\\s+nieces\\s+are((\\s+(\\S+):(\\d+))*)";
Pattern pattern = Pattern.compile;
Matcher matcher = pattern.matcher(sentence);

I understand something like

matcher.find(0); // resets "pointer"
String niece = matcher.group(2);
String nieceName = matcher.group(3);
String nieceAge = matcher.group(4);

would give me my last niece (" Tara:10", "Tara", "10",).

How would I collect all of my nieces instead of only the last, using only one regex/pattern?

I would like to avoid using split string.

Upvotes: 2

Views: 74

Answers (2)

bobble bubble
bobble bubble

Reputation: 18490

Another idea is to use the \G anchor that matches where the previous match ended (or at start).

String regex = "(?:\\G(?!\\A)|My\\s+nieces\\s+are)\\s+(\\S+):(\\d+)";
  • If My\s+nieces\s+are matches
  • \G will chain matches from there
  • (?!\A) neg. lookahead prevents \G from matching at \A start
  • \s+(\S+):(\d+) using two capturing groups for extraction

See this demo at regex101 or a Java demo at tio.run

Matcher m = Pattern.compile(regex).matcher(sentence);

while (m.find()) {
  System.out.println(m.group(1));
  System.out.println(m.group(2));
}

Upvotes: 2

shmosel
shmosel

Reputation: 50716

You can't iterate over repeating groups, but you can match each group individually, calling find() in a loop to get the details of each one. If they need to be back-to-back, you can iteratively bound your matcher to the last index, like this:

Matcher matcher = Pattern.compile("My\\s+nieces\\s+are").matcher(sentence);
if (matcher.find()) {
    int boundary = matcher.end();
    
    matcher = Pattern.compile("^\\s+(\\S+):(\\d+)").matcher(sentence);
    while (matcher.region(boundary, sentence.length()).find()) {
        System.out.println(matcher.group());
        System.out.println(matcher.group(1));
        System.out.println(matcher.group(2));
        
        boundary = matcher.end();
    }
}

Upvotes: 2

Related Questions