Reputation: 1030
I'm attempting to parse the following example text in Python:
Foo 1
foo1Text
Bar
bar1Text
Baz
baz1Text
Foo 2
foo2Text
Bar
bar2Text
Baz
baz2Text
# and so on up to Foo/Bar/Baz N
Now, the regex I'm using is:
([\S ]+)(\n*)([\s\S]*?)Bar([\s\S]*?)Baz([\s\S]*?)
Now - what I'd like to do is lift out the text relevant to foo
/bar
/baz
. However, with the lazy qualifier on the end of the regex, ?
the expression stops short and misses the baz2text
. Conversely, making it greedy matches everything else as part of the last group.
I'd prefer to not use a numeric qualifier if possible and broadly match things based on:
{title}
{stuff about title}
Bar
{stuff about Bar}
Baz
{stuff about Baz}
So I may iterate through each match and extract groups accordingly. Please note, I've not phrased this around extracting concrete output. I'm mostly interested in getting the regex 'groups' so they represent: {title}
, {stuff about title}
, {stuff about bar}
, {stuff about Baz}
I was putzing around with regex101 to see if I could determine the right incantation to no avail.
This is one of those problems where its easy enough to do manually. But then I wouldn't learn anything! :) I'd love to know if there's some cleaner mechanism / strategy I should be using here.
Thanks much
Upvotes: 1
Views: 109
Reputation: 5261
If you know that Foo
is the next group after Baz
, then what you need is a
lookahead: ([\S ]+)(\n*)([\s\S]*?)Bar([\s\S]*?)Baz([\s\S]*?)(?=Foo)
.
Lookaheads are zero-width assertions, so it ensures a match immediately follows but doesn't change the current position.
Upvotes: 1