Mike
Mike

Reputation: 83

How to find repetitive patterns in a string with regular expressions?

For example, given such a string: I like bla blab blah chocolate I like bla blob bla cheese I'd like to find all the strings that starts with "I like" then followed by some text and the value.

My problem is that it detects it but .* takes everything until the end returning then one match instead of two.

In [37]: s = 'I like bla blab blah chocolate I like bla blob bla cheese'

In [38]: p = re.compile(r'(I like) .* (\w+)', re.IGNORECASE)

In [39]: p.findall(s)
Out[39]: [('I like', 'cheese')]

I am expecting: [('I like', 'chocolate'), ('I like', 'cheese')]

Upvotes: 2

Views: 143

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626748

You can use

\b(I like)\b.*?(\w+)(?=\s*(?:\bI like\b|$))

See the regex demo. Details:

  • \b(I like)\b - Group 1, I like matched as whole words
  • .*? - any zero or more chars other than line break chars, as few as possible
  • (\w+) - Group 2: one or more letters/digits/_ (and some more connector punctuation)
  • (?=\s*(?:\bI like\b|$)) - a positive lookahead that matches a location in string that is immediately followed with
    • \s* - zero or more whitespaces
    • (?:\bI like\b|$) - either I like as whole words or end of string.

See a Python demo:

import re
s = 'I like bla blab blah chocolate I like bla blob bla cheese'
print( re.findall(r'\b(I like)\b.*?(\w+)(?=\s*(?:\bI like\b|$))', s) )
# => [('I like', 'chocolate'), ('I like', 'cheese')]

Upvotes: 1

Related Questions