Sanath Kumar
Sanath Kumar

Reputation: 76

How to match until first occurrence of a pattern?

I am parsing a file and trying to extract multiple sections in the file. One such section is called 'Report', a single file might contain multiple reports. I wish to extract each of these 'Report' sections from the file using a regex.

Issue being faced:

There are multiple sections which end with '-----', how do I stop at the first occurrence of it?

Current Regex:

-+(\s+)?Report(\s+)?-+\n(.*\n)+\n-{72}

This regex, unfortunately, matches all the sections as a single one, whereas I intend to stop at the first occurrence of '----' section ending pattern.

Sample File:

----------- Report -----------

Lorem ipsum dolor sit amet, consectetur adipiscing elit. At hoc in eo M. Si longus, levis; Ita prorsus, inquam; Tu quidem reddes; Ratio quidem vestra sic cogit. Duo Reges: constructio interrete. Tum Torquatus: Prorsus, inquit, assentior

------------------------------

Putabam equidem satis, inquit, me dixisse. Dicimus aliquem hilare vivere; Quonam, inquit, modo? Nescio quo modo praetervolavit oratio.

----------- Report -----------

At hoc in eo M. Sed quae tandem ista ratio est? Quoniam, si dis placet, ab Epicuro loqui discimus. Venit ad extremum; Illud non continuo, ut aeque incontentae.

------------------------------

Illi enim inter se dissentiunt. Equidem e Cn. At multis malis affectus. Hoc loco tenere se Triarius non potuit. Haec dicuntur inconstantissime. Efficiens dici potest.

----------- Analysis -----------

At hoc in eo M. Sed quae tandem ista ratio est? Quoniam, si dis placet, ab Epicuro loqui discimus. Venit ad extremum; Illud non continuo, ut aeque incontentae.

----------------------------

Note:

  1. The ending '----' pattern has '-' repeating 72 times
  2. There is always a one empty line after '--- Report ---' and before the ending '----'pattern
  3. Language being used: Python

Upvotes: 0

Views: 535

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626689

You may use

(?s)-+\s*Report\s*-+\n(.*?)\n-{72}

Or - since ---Report----s start at the start of the lines:

(?sm)^-+\s*Report\s*-+\n(.*?)\n-{72}

See the regex demo

Details:

  • (?s) - enable DOTALL mode
  • -+ - 1+ hyphens
  • \s* - 0+ whitespaces
  • Report - a substring of literal chars
  • \s* - 0+ whitespaces
  • -+ - 1+ hyphens
  • \n - a newline
  • (.*?) - capturing group 1 matching any 0+ chars but as few as possible up to the first...
  • \n-{72} - newline followed with 72 hyphens.

Use with re.findall.

Upvotes: 2

Related Questions