Reputation: 7839
I want to capture the word preceding the sequence ' AE'
(if any) or the word preceding the sequence ' BE'
(again, if any), or both words if both sequences appear in a string.
I tried with the following regular expression:
TEST = re.compile(
r'(.*?)'
r'(?:(\w+) AE)?'
r'.*?'
r'(?:(\w+) BE)?')
It captures either a word preceding ' BE'
or a word preceding ' AE'
but not
both words.
>>> TEST.match('').groups()
('', None, None)
>>> TEST.match('foo AE').groups()
('', 'foo', None)
>>> TEST.match('foo BE').groups()
('', None, 'foo')
>>> TEST.match('foo AE bar BE').groups()
('', 'foo', None)
Instead I would like the last line of output be
>>> TEST.match('foo AE bar BE').groups()
('', 'foo', 'bar')
Upvotes: 0
Views: 58
Reputation: 369304
Using RegexObject.findall
:
>>> pattern = re.compile(r'\s*(.*?)\s*(?:AE|BE)')
>>>
>>> pattern.findall('')
[]
>>> pattern.findall('bar BE')
['bar']
>>> pattern.findall('foo AE')
['foo']
>>> pattern.findall('foo AE bar BE')
['foo', 'bar']
Upvotes: 1