Reputation: 11
I'm very new to regular expressions and I need some help finding the correct regular expression.
I have a text file of the form:
apple 4
bananas 5
bananas 5 7
apple 3
apple 6
bananas 3
bananas 4 5
apple 3
bananas 9
I am looking for a regular expression that will match the last occurrence of "bananas.*"
after each "apple.*"
, keeping in mind that for every "apple.*"
there may be no "bananas.*"
. The regex should match to the following:
bananas 5 7
bananas 4 5
bananas 9
Thanks in advance. I am doing this in python if that helps.
Upvotes: 0
Views: 71
Reputation: 5609
There's nothing that needs to be recursive. Here's a pattern that will work:
>>> fruit_lit = """apple 4
bananas 5
bananas 5 7
apple 3
apple 6
bananas 3
bananas 4 5
apple 3
bananas 9"""
>>> re.findall(r'apple\s*\d*\s*\n(?:bananas\s*(?:\d+\s*)+\n)*(bananas(?:\s*\d+)+)\s*', fruit_list)
['bananas 5 7', 'bananas 4 5', 'bananas 9']
And as many of the comments mention, regex might not be the best way to get what you're trying to find. Iterating over each line and testing line.starswith('apple')
then line.startswith('banana')
for each subsequent line might be a better way.
Upvotes: 0
Reputation: 43199
It actually is possible with regular expressions:
^apple.+[\n\r]
(?:(bananas.*)[\n\r]?)+
See a demo on regex101.com, mind the different modifiers and use group 1
of every match.
Python
code:
import re
string = """
apple 4
bananas 5
bananas 5 7
apple 3
apple 6
bananas 3
bananas 4 5
apple 3
bananas 9
"""
rx = re.compile(r"""
^apple.+[\n\r]
(?:(bananas.*)[\n\r]?)+
""", re.MULTILINE | re.VERBOSE)
bananas = [m.group(1) for m in rx.finditer(string)]
print(bananas)
See a demo on ideone.com.
Upvotes: 1