Reputation: 739
I have an input string
invalidsufix\nsubadatax\nsufixpart\nsubdata1\nsomerandomn\nsubdata2\nsubdatan\nend
I want to fetch only the subdata part of it, I tried,
Pattern p = Pattern.compile('(?<=sufixpart).*?(subdata.)+.*?(?=end)',Pattern.DOTALL);
Matcher m = p.matcher(inputString);
while(m.find()){
System.out.println(m.group(1));
}
But I get only the first match. How can i get all the subdata, such as [subdata1,subdata2,subdata3]
?
Upvotes: 3
Views: 689
Reputation: 627536
I'd go for a simpler approach, get the blocks first with a regex like start(.*?)end
and then extract all the matches from Group 1 with a mere subdata\S*
-like regex.
See the Java demo:
String rx = "(?sm)^sufixpart$(.*?)^end$";
String s = "invalidsufix\nsubadatax\nsufixpart\nsubdata1\nsomerandomn\nsubdata2\nsubdatan\nend\ninvalidsufix\nsubadatax\nsufixpart\nsubdata001\nsomerandomn\nsubdata002\nsubdata00n\nend";
Pattern pattern_outer = Pattern.compile(rx);
Pattern pattern_token = Pattern.compile("(?m)^subdata\\S*$");
Matcher matcher = pattern_outer.matcher(s);
List<List<String>> res = new ArrayList<>();
while (matcher.find()){
List<String> lst = new ArrayList<>();
if (!matcher.group(1).isEmpty()) { // If Group 1 is not empty
Matcher m = pattern_token.matcher(matcher.group(1)); // Init the second matcher
while (m.find()) { // If a token is found
lst.add(m.group(0)); // add it to the list
}
}
res.add(lst); // Add the list to the result list
}
System.out.println(res); // => [[subdata1, subdata2, subdatan], [subdata001, subdata002, subdata00n]]
Another approach is to use a \G
based regex:
(?sm)(?:\G(?!\A)|^sufixpart$)(?:(?!^(?:sufixpart|end)$).)*?(subdata\S*)(?=.*?^end$)
See the regex demo
Explanation:
(?sm)
- enables DOTALL and MULTILINE modes(?:\G(?!\A)|^sufixpart$)
- matches either the end of the previous successful match (\G(?!\A)
) or a whole line with sufixpart
text on it (^sufixpart$
)(?:(?!^(?:sufixpart|end)$).)*?
- matches any single char that is not the starting point of a sufixpart
or end
that are whole lines(subdata\S*)
- Group 1 matching subdata
and 0+ non-whitespaces(?=.*?^end$)
- there must be a end
line after any 0+ chars.String rx = "(?sm)(\\G(?!\\A)|^sufixpart$)(?:(?!^(?:sufixpart|end)$).)*?(subdata\\S*)(?=.*?^end$)";
String s = "invalidsufix\nsubadatax\nsufixpart\nsubdata1\nsomerandomn\nsubdata2\nsubdatan\nend\ninvalidsufix\nsubadatax\nsufixpart\nsubdata001\nsomerandomn\nsubdata002\nsubdata00n\nend";
Pattern pattern = Pattern.compile(rx);
Matcher matcher = pattern.matcher(s);
List<List<String>> res = new ArrayList<>();
List<String> lst = null;
while (matcher.find()){
if (!matcher.group(1).isEmpty()) {
if (lst != null) res.add(lst);
lst = new ArrayList<>();
lst.add(matcher.group(2));
} else lst.add(matcher.group(2));
}
if (lst != null) res.add(lst);
System.out.println(res);
Upvotes: 1