Reputation: 80
In the following text
My cow always gives milk. Your cow sometimes produces milk.
I want to extract
'always gives', 'sometimes produces'
Using the encapsulating strings "cow" and "milk".
I tried "cow(.*)milk" following Regular Expression to get a string between two strings in Javascript however it only works if the first sentence is alone
My cow always gives milk.
In my case, using that regular expression returns
always gives milk. Your cow sometimes produces
Additionally, I have also tried "(?<=foo)[^bar]*(?=bar)" from Extracting all values between curly braces regex php. And this works great. For example (and this is closer to the actual problem I'm trying to solve)
fooSTRINGbar fooCHARACTERSbar
Returns
'STRING', 'CHARACTERS'
Great! But for some reason if "STRING" contains a character that "bar" has, then the match fails. For example,
fooSTRaINGbar
Doesn't return anything.
Upvotes: 0
Views: 196
Reputation: 103744
You can use a lookbehind and lookahead as anchors between a non-greedy match as well:
>>> s='My cow always gives milk. Your cow sometimes produces milk.'
>>> re.findall(r'(?<=\bcow\b)(.*?)(?=\bmilk\b)', s)
[' always gives ', ' sometimes produces ']
If you want to strip the surrounding spaces:
>>> re.findall(r'(?<=\bcow\b)\s*(.*?)\s*(?=\bmilk\b)', s)
['always gives', 'sometimes produces']
You can also use a lookahead to validate that the match does not contain spaces:
>>> s='My cow gives milk. Your cow sometimes produces milk.'
>>> re.findall(r'(?<=\bcow\b)(?=\s*\S+\s*milk\b)\s*(.*?)\s*(?=\bmilk\b)', s)
['gives']
Or:
>>> re.findall(r'(?<=\bcow\b)\s*(\S+)\s*(?=\bmilk\b)', s)
['gives']
Same method works with your second example:
>>> s='fooSTRINGbar fooCHARACTERSbar'
>>> re.findall(r'(?<=foo)([A-Z]+)(?=bar)', s)
['STRING', 'CHARACTERS']
Upvotes: 0
Reputation: 179392
Try the non-greedy option:
cow(.*?)milk
The ?
qualifier tells the regex engine to stop a match at the earliest opportunity, rather than greedily trying to match as much as possible (which is the problem you saw with cow(.*)milk
).
Upvotes: 4