Reputation: 65
I want to match:
first second
and
second first
so the regular expression:
re.match(r'(?:(?P<f>first) (?P<s>second)|(?P=s) (?P=f))', 'first second')
matches, but this one:
re.match(r'(?:(?P<f>first) (?P<s>second)|(?P=s) (?P=f))', 'second first')
does not matches. Is this a bug on backreference in A|B ?
Upvotes: 0
Views: 96
Reputation: 91430
How about:
(?=.*(?P<f>first))(?=.*(?P<s>second))
(?=...)
is a positive lookahead it assumes that the word first
is present somewhere in the string without making it part of the match (it's a zero length assertion). It's the same for second
.
This regex is true if there is first
and second
in any order in the string.
Upvotes: 0
Reputation: 1122242
You've misunderstood how backreferences work. For a backreference to match anything, the original reference must have matched too.
In your second example, the (?P<f>first)
group didn't match anything, so the (?P=f)
back reference cannot match anything either.
Back references are the wrong tool here; you'll have to repeat at least one of your groups, literally:
r'(?:(?P<f>first )?(?P<s>second)(?(f)| first))'
would use a conditional pattern that only matches first
after second
if there was no f
match before second
:
>>> import re
>>> pattern = re.compile(r'(?:(?P<f>first )?(?P<s>second)(?(f)$| first))')
>>> pattern.match('first second').group()
'first second'
>>> pattern.match('second first').group()
'second first'
Upvotes: 2