Reputation: 83
For example, given such a string:
I like bla blab blah chocolate I like bla blob bla cheese
I'd like to find all the strings that starts with "I like" then followed by some text and the value.
My problem is that it detects it but .*
takes everything until the end returning then one match instead of two.
In [37]: s = 'I like bla blab blah chocolate I like bla blob bla cheese'
In [38]: p = re.compile(r'(I like) .* (\w+)', re.IGNORECASE)
In [39]: p.findall(s)
Out[39]: [('I like', 'cheese')]
I am expecting:
[('I like', 'chocolate'), ('I like', 'cheese')]
Upvotes: 2
Views: 143
Reputation: 626748
You can use
\b(I like)\b.*?(\w+)(?=\s*(?:\bI like\b|$))
See the regex demo. Details:
\b(I like)\b
- Group 1, I like
matched as whole words.*?
- any zero or more chars other than line break chars, as few as possible(\w+)
- Group 2: one or more letters/digits/_
(and some more connector punctuation)(?=\s*(?:\bI like\b|$))
- a positive lookahead that matches a location in string that is immediately followed with
\s*
- zero or more whitespaces(?:\bI like\b|$)
- either I like
as whole words or end of string.See a Python demo:
import re
s = 'I like bla blab blah chocolate I like bla blob bla cheese'
print( re.findall(r'\b(I like)\b.*?(\w+)(?=\s*(?:\bI like\b|$))', s) )
# => [('I like', 'chocolate'), ('I like', 'cheese')]
Upvotes: 1