Reputation: 76
I am parsing a file and trying to extract multiple sections in the file. One such section is called 'Report', a single file might contain multiple reports. I wish to extract each of these 'Report' sections from the file using a regex.
Issue being faced:
There are multiple sections which end with '-----', how do I stop at the first occurrence of it?
Current Regex:
-+(\s+)?Report(\s+)?-+\n(.*\n)+\n-{72}
This regex, unfortunately, matches all the sections as a single one, whereas I intend to stop at the first occurrence of '----' section ending pattern.
Sample File:
----------- Report -----------
Lorem ipsum dolor sit amet, consectetur adipiscing elit. At hoc in eo M. Si longus, levis; Ita prorsus, inquam; Tu quidem reddes; Ratio quidem vestra sic cogit. Duo Reges: constructio interrete. Tum Torquatus: Prorsus, inquit, assentior
------------------------------
Putabam equidem satis, inquit, me dixisse. Dicimus aliquem hilare vivere; Quonam, inquit, modo? Nescio quo modo praetervolavit oratio.
----------- Report -----------
At hoc in eo M. Sed quae tandem ista ratio est? Quoniam, si dis placet, ab Epicuro loqui discimus. Venit ad extremum; Illud non continuo, ut aeque incontentae.
------------------------------
Illi enim inter se dissentiunt. Equidem e Cn. At multis malis affectus. Hoc loco tenere se Triarius non potuit. Haec dicuntur inconstantissime. Efficiens dici potest.
----------- Analysis -----------
At hoc in eo M. Sed quae tandem ista ratio est? Quoniam, si dis placet, ab Epicuro loqui discimus. Venit ad extremum; Illud non continuo, ut aeque incontentae.
----------------------------
Note:
Upvotes: 0
Views: 535
Reputation: 626689
You may use
(?s)-+\s*Report\s*-+\n(.*?)\n-{72}
Or - since ---Report----
s start at the start of the lines:
(?sm)^-+\s*Report\s*-+\n(.*?)\n-{72}
See the regex demo
Details:
(?s)
- enable DOTALL mode-+
- 1+ hyphens\s*
- 0+ whitespacesReport
- a substring of literal chars\s*
- 0+ whitespaces-+
- 1+ hyphens\n
- a newline(.*?)
- capturing group 1 matching any 0+ chars but as few as possible up to the first...\n-{72}
- newline followed with 72 hyphens.Use with re.findall
.
Upvotes: 2