Java -- Best way to grab ALL Strings between two regex?

Question

I have a requirement where I want to extract the content from a file which can have multiple occurrences of the pattern. Basically files containing multiple sections and I want to extra each section. The extracted content should include the string matching the pattern

Eg: File content

01
Community based Index1- 
...some text....
...some text..
Conclusion: The significant increase of testing 
...
some text. 

02
Community based Index2- 
.some text.
.some text.
Conclusion: The significant increase of testing 
...
... 
:
:

I am trying with the following pattern but it is not working

String patternStart = "\d{2}[^\d.,)][\s:-]?[\r\n][A-Z]";
String patternEnd = "Conclusion.*(\n.*)*"; \ including the entire para

I am trying with pattern matcher but it is not working, I am getting no match found.

 String regexString = Pattern.quote(patternStart)  + "(.*?)" + Pattern.quote(patternEnd);
 Pattern pattern = Pattern.compile(regexString);
 while (matcher.find()) {
            String textInBetween = matcher.group(1);
  }

The fourth bird · Accepted Answer

You could use a single pattern to extract the whole section:

^\d+(?:\R(?!\d+\R|Conclusion:).*)*\RConclusion:\h+(.*(?:\R(?!\d+\R|Conclusion:).*)*)

Explanation

^ Start of string
\d+ Match 1+ digits
(?: Non capture group
- \R(?!\d+\R|Conclusion:).* Match a unicode newline sequence and the rest of the line if it does not start with either 1+ digits and a newline or Conclusion:
)* Close group and repeat 0+ times to match all the lines
\RConclusion:\h+ Match a newline and Conclusion: followed by 1+ horizontal whitespace chars
( Capture group 1
- .* Match the whole line
- (?:\R(?!\d+\R|Conclusion:).*)* Repeat 0+ times matching all lines that do not start with either 1+ digits followed by a newline or Conclusion:
) Close group 1

Regex demo

In Java

String regex = "^\d+(?:\R(?!\d+\R|Conclusion:).*)*\RConclusion: (.*(?:\R(?!\d+\R|Conclusion:).*)*)";

See a Java demo

Java -- Best way to grab ALL Strings between two regex?

Answers (1)

Related Questions