Josh M.
Josh M.

Reputation: 11

Regular expression to match with optional following text

I'm very new to regular expressions and I need some help finding the correct regular expression.

I have a text file of the form:

apple 4
bananas 5
bananas 5 7
apple 3
apple 6
bananas 3
bananas 4 5
apple 3
bananas 9 

I am looking for a regular expression that will match the last occurrence of "bananas.*" after each "apple.*", keeping in mind that for every "apple.*" there may be no "bananas.*". The regex should match to the following:

bananas 5 7 
bananas 4 5
bananas 9

Thanks in advance. I am doing this in python if that helps.

Upvotes: 0

Views: 71

Answers (2)

Billy
Billy

Reputation: 5609

There's nothing that needs to be recursive. Here's a pattern that will work:

>>> fruit_lit = """apple 4
bananas 5
bananas 5 7
apple 3
apple 6
bananas 3
bananas 4 5
apple 3
bananas 9"""

>>>  re.findall(r'apple\s*\d*\s*\n(?:bananas\s*(?:\d+\s*)+\n)*(bananas(?:\s*\d+)+)\s*', fruit_list)
['bananas 5 7', 'bananas 4 5', 'bananas 9']

And as many of the comments mention, regex might not be the best way to get what you're trying to find. Iterating over each line and testing line.starswith('apple') then line.startswith('banana') for each subsequent line might be a better way.

Upvotes: 0

Jan
Jan

Reputation: 43199

It actually is possible with regular expressions:

^apple.+[\n\r]
(?:(bananas.*)[\n\r]?)+

See a demo on regex101.com, mind the different modifiers and use group 1 of every match.


As full Python code:

import re

string = """
apple 4
bananas 5
bananas 5 7
apple 3
apple 6
bananas 3
bananas 4 5
apple 3
bananas 9 
"""

rx = re.compile(r"""
        ^apple.+[\n\r]
        (?:(bananas.*)[\n\r]?)+
        """, re.MULTILINE | re.VERBOSE)

bananas = [m.group(1) for m in rx.finditer(string)]
print(bananas)

See a demo on ideone.com.

Upvotes: 1

Related Questions