Benjamin Saljooghi
Benjamin Saljooghi

Reputation: 80

Regular Expression in Python to match multiple incidents of a string between two known strings

In the following text

My cow always gives milk. Your cow sometimes produces milk.

I want to extract

'always gives', 'sometimes produces'

Using the encapsulating strings "cow" and "milk".

I tried "cow(.*)milk" following Regular Expression to get a string between two strings in Javascript however it only works if the first sentence is alone

My cow always gives milk.

In my case, using that regular expression returns

always gives milk. Your cow sometimes produces 

Additionally, I have also tried "(?<=foo)[^bar]*(?=bar)" from Extracting all values between curly braces regex php. And this works great. For example (and this is closer to the actual problem I'm trying to solve)

fooSTRINGbar fooCHARACTERSbar

Returns

'STRING', 'CHARACTERS'

Great! But for some reason if "STRING" contains a character that "bar" has, then the match fails. For example,

fooSTRaINGbar

Doesn't return anything.

Upvotes: 0

Views: 196

Answers (2)

dawg
dawg

Reputation: 103744

You can use a lookbehind and lookahead as anchors between a non-greedy match as well:

>>> s='My cow always gives milk. Your cow sometimes produces milk.'
>>> re.findall(r'(?<=\bcow\b)(.*?)(?=\bmilk\b)', s)
[' always gives ', ' sometimes produces ']

If you want to strip the surrounding spaces:

>>> re.findall(r'(?<=\bcow\b)\s*(.*?)\s*(?=\bmilk\b)', s)
['always gives', 'sometimes produces']

You can also use a lookahead to validate that the match does not contain spaces:

>>> s='My cow gives milk. Your cow sometimes produces milk.'
>>> re.findall(r'(?<=\bcow\b)(?=\s*\S+\s*milk\b)\s*(.*?)\s*(?=\bmilk\b)', s)
['gives']

Or:

>>> re.findall(r'(?<=\bcow\b)\s*(\S+)\s*(?=\bmilk\b)', s)
['gives']

Same method works with your second example:

>>> s='fooSTRINGbar fooCHARACTERSbar'
>>> re.findall(r'(?<=foo)([A-Z]+)(?=bar)', s)
['STRING', 'CHARACTERS']

Upvotes: 0

nneonneo
nneonneo

Reputation: 179392

Try the non-greedy option:

cow(.*?)milk

The ? qualifier tells the regex engine to stop a match at the earliest opportunity, rather than greedily trying to match as much as possible (which is the problem you saw with cow(.*)milk).

Upvotes: 4

Related Questions