Reputation: 63
I would like to pick up "Bar" from the following strings:
FooFooFoo the FooFoo the Bar Foo
FooFooFoo the FooFoo my Bar Foo
But the regex I wrote (the|my) (?P<bar>.+?) Foo
seems to be too greedy and collects more text than required (example at regex101.com)
edit: "Bar" is an exemplified string to match. In my real case scenario that could me made up of multiple words.
What am I doing wrong? Thanks!
I need to run this with the standard re python library.
Upvotes: 1
Views: 80
Reputation: 627100
Your main issue is that the regex engine searches for matches from left to right, and once my
or the
is found, the .+?
will match as few chars other than line break chars as possible, but as many as necessary to complete a valid match.
You need to match all text (using .*?
) up to the last word (that can be matched with a \w+
pattern) before Foo
:
(the|my) .*?(?P<bar>\w+) Foo
See the regex demo. Another variation is to match the
or my
as whole words and match any text up to the closest non-whitespace char chunk before Foo
:
\b(the|my)\b.*?(?P<bar>\S+)\s+Foo
See this regex demo. Details:
\b(the|my)\b
- the the
or my
word as a whole word.*?
- any zero or more chars other than line break chars, as few as possible(?P<bar>\S+)
- Group "bar": one or more non-whitespace chars\s+
- one or more whitespace charsFoo
- a Foo
string.Upvotes: 1