Matteo
Matteo

Reputation: 65

bug on module re in python (backreference)?

I want to match:

first second

and

second first

so the regular expression:

re.match(r'(?:(?P<f>first) (?P<s>second)|(?P=s) (?P=f))', 'first second')

matches, but this one:

re.match(r'(?:(?P<f>first) (?P<s>second)|(?P=s) (?P=f))', 'second first')

does not matches. Is this a bug on backreference in A|B ?

Upvotes: 0

Views: 96

Answers (2)

Toto
Toto

Reputation: 91430

How about:

(?=.*(?P<f>first))(?=.*(?P<s>second))

(?=...) is a positive lookahead it assumes that the word first is present somewhere in the string without making it part of the match (it's a zero length assertion). It's the same for second.

This regex is true if there is first and second in any order in the string.

Upvotes: 0

Martijn Pieters
Martijn Pieters

Reputation: 1122242

You've misunderstood how backreferences work. For a backreference to match anything, the original reference must have matched too.

In your second example, the (?P<f>first) group didn't match anything, so the (?P=f) back reference cannot match anything either.

Back references are the wrong tool here; you'll have to repeat at least one of your groups, literally:

r'(?:(?P<f>first )?(?P<s>second)(?(f)| first))'

would use a conditional pattern that only matches first after second if there was no f match before second:

>>> import re
>>> pattern = re.compile(r'(?:(?P<f>first )?(?P<s>second)(?(f)$| first))')
>>> pattern.match('first second').group()
'first second'
>>> pattern.match('second first').group()
'second first'

Upvotes: 2

Related Questions