Reputation: 19905
Given an excerpt of text like
Preface (optional, up to multiple lines)
Main : sequence1
sequence2
sequence3
sequence4
Epilogue (optional, up to multiple lines)
which Java
regular expression could be used to extract all the sequences (i.e. sequence1
, sequence2
, sequence3
, sequence4
above)? For example, a Matcher.find()
loop?
Each "sequence" is preceded by and may also contain 0 or more white spaces (including tabs).
The following regex
(?m).*Main(?:[ |t]+:(?:[ |t]+(\S+)[\r\n])+
only yields the first sequence (sequence1
).
Upvotes: 4
Views: 324
Reputation: 626804
You may use the following regex:
(?m)(?:\G(?!\A)[^\S\r\n]+|^Main\s*:\s*)(\S+)\r?\n?
Details:
(?m)
- multiline mode on(?:\G(?!\A)[^\S\r\n]+|^Main\s*:\s*)
- either of the two:
\G(?!\A)[^\S\r\n]+
- end of the previous successful match (\G(?!\A)
) and then 1+ horizontal whitespaces ([^\S\r\n]+
, can be replaced with [\p{Zs}\t]+
or [\s&&[^\r\n]]+
)|
- or^Main\s*:\s*
- start of a line, Main
, 0+ whitespaces, :
, 0+ whitespaces(\S+)
- Group 1 capturing 1+ non-whitespace symbols\r?\n?
- an optional CR and an optional LF.See the Java code below:
String p = "(?m)(?:\\G(?!\\A)[^\\S\r\n]+|^Main\\s*:\\s*)(\\S+)\r?\n?";
String s = "Preface (optional, up to multiple lines)...\nMain : sequence1\n sequence2\n sequence3\n sequence4\nEpilogue (optional, up to multiple lines)";
Matcher m = Pattern.compile(p).matcher(s);
while(m.find()) {
System.out.println(m.group(1));
}
Upvotes: 3