BitPusher
BitPusher

Reputation: 1030

Python multi-line regex greedy group

I'm attempting to parse the following example text in Python:

Foo 1
foo1Text

Bar 
bar1Text

Baz 
baz1Text

Foo 2
foo2Text

Bar 
bar2Text

Baz 
baz2Text

# and so on up to Foo/Bar/Baz N

Now, the regex I'm using is:

([\S ]+)(\n*)([\s\S]*?)Bar([\s\S]*?)Baz([\s\S]*?)

Now - what I'd like to do is lift out the text relevant to foo/bar/baz. However, with the lazy qualifier on the end of the regex, ? the expression stops short and misses the baz2text. Conversely, making it greedy matches everything else as part of the last group.

I'd prefer to not use a numeric qualifier if possible and broadly match things based on:

{title}
{stuff about title}

Bar
{stuff about Bar}

Baz
{stuff about Baz}

So I may iterate through each match and extract groups accordingly. Please note, I've not phrased this around extracting concrete output. I'm mostly interested in getting the regex 'groups' so they represent: {title}, {stuff about title}, {stuff about bar}, {stuff about Baz}

I was putzing around with regex101 to see if I could determine the right incantation to no avail.

This is one of those problems where its easy enough to do manually. But then I wouldn't learn anything! :) I'd love to know if there's some cleaner mechanism / strategy I should be using here.

Thanks much

Upvotes: 1

Views: 109

Answers (1)

Brian Stephens
Brian Stephens

Reputation: 5261

If you know that Foo is the next group after Baz, then what you need is a lookahead: ([\S ]+)(\n*)([\s\S]*?)Bar([\s\S]*?)Baz([\s\S]*?)(?=Foo).

Lookaheads are zero-width assertions, so it ensures a match immediately follows but doesn't change the current position.

Upvotes: 1

Related Questions