Ernest A
Ernest A

Reputation: 7839

Capture both of two optional groups

I want to capture the word preceding the sequence ' AE' (if any) or the word preceding the sequence ' BE' (again, if any), or both words if both sequences appear in a string.

I tried with the following regular expression:

TEST = re.compile(
    r'(.*?)'
    r'(?:(\w+) AE)?'
    r'.*?'
    r'(?:(\w+) BE)?')

It captures either a word preceding ' BE' or a word preceding ' AE' but not both words.

>>> TEST.match('').groups()
('', None, None)
>>> TEST.match('foo AE').groups()
('', 'foo', None)
>>> TEST.match('foo BE').groups()
('', None, 'foo')
>>> TEST.match('foo AE bar BE').groups()
('', 'foo', None)

Instead I would like the last line of output be

>>> TEST.match('foo AE bar BE').groups()
('', 'foo', 'bar')

Upvotes: 0

Views: 58

Answers (1)

falsetru
falsetru

Reputation: 369304

Using RegexObject.findall:

>>> pattern = re.compile(r'\s*(.*?)\s*(?:AE|BE)')
>>>
>>> pattern.findall('')
[]
>>> pattern.findall('bar BE')
['bar']
>>> pattern.findall('foo AE')
['foo']
>>> pattern.findall('foo AE bar BE')
['foo', 'bar']

Upvotes: 1

Related Questions