Tarek Eldeeb
Tarek Eldeeb

Reputation: 606

Python regex to match many tokens in sequnece

I have a test string that looks like

These are my food preferences mango and I also like bananas and I like grapes too.

I am trying to write a regex in python to return the text with such rules:

My current expression is: (live: https://regex101.com/r/1CSSNc/1/ )

(?P<Start>\bpreferences\b)(?:\s*(?:(?P<Name>\w*)\s*){1,7}like)*?(\s*(?P<Last>\w*\s*){1,7})

which returns

Match 1     18-64   preferences mango and I also like bananas and 
Group Start 18-29   preferences
Group 3     29-64    mango and I also like bananas and 
Group Last  60-64   and 

I expected/wanted the output to be:

Match 1     18-64   preferences mango .. grapes too
Group Start 18-29   preferences
Group 3     29-64    mango and I also 
Group 4     xx xx    bananas and I
Group Last  60-64    grapes too

My implementation is missing some concepts here.

Upvotes: 0

Views: 66

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626903

You can use

(?P<Start>\bpreferences\b)(?P<Mid>(?:\s+\w+(?:\s+\w+){0,6}?\s+like)+)(?:\s+(?P<Last>\w+(?:\s+\w+){1,7}))?

See the regex demo.

Details:

  • (?P<Start>\bpreferences\b) - Group "Start": a whole word preferences
  • (?P<Mid>(?:\s+\w+(?:\s+\w+){0,6}?\s+like)+) - Group "Mid": one or more repetitions of
    • \s+ - one or more whitespaces
    • \w+(?:\s+\w+){0,6}? - one or more word chars and then zero to six occurrences of one or more whitespaces and then one or more word chars, as few as possible
    • \s+like - one or more whitespaces and then the word like
  • (?:\s+(?P<Last>\w+(?:\s+\w+){1,7}))? - an optional occurrence of
    • \s+ - one or more whitespaces
    • (?P<Last>\w+(?:\s+\w+){1,7}) - Group "Last": one or more word chars and then one to seven occurrences of one or more whitespaces and one or more word chars

See the Python demo:

import re
text = "These are my food preferences mango and I also like bananas and I like grapes too."
pattern = r"(?P<Start>\bpreferences\b)(?P<Mid>(?:\s+\w+(?:\s+\w+){0,6}?\s+like)+)(?:\s+(?P<Last>\w+(?:\s+\w+){1,7}))?"
match = re.search(pattern, text)
if match:
    print(match.group("Start"))
    print( re.split(r"\s*\blike\b\s*", match.group("Mid").strip()) )
    print(match.group("Last"))

Output:

preferences
['mango and I also', 'bananas and I', '']
grapes too

Upvotes: 1

Related Questions