Reputation: 113
I try to combine several regexes to match date.
for example, I have
regex1: (3 groups inside, which are 'month', 'day', 'year')
(?:(?P<month>\d{1,2})[/-](?P<day>\d{1,2})[/-](?P<year>\d{2,4}))
regex2: (also 3 groups inside)
(?P<day>\d{1,2}) (?P<month>Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z.]*[,.]? (?P<year>\d{4})
and a lot other regexes.
I have tested them one by one. Now I want to combine them into one regex with a lot of "|
"s between them.
I have tried:
import re
regexes = re.compile('regex1 here |'
'regex2 here |'
'regex3 here')
but it return error like:
redefinition of group name 'month' as group 4; was group 1 at position 59
My guess is that the group with the same name can only occur once ?
So, how can I combine all these regexes into one with named groups?
Upvotes: 1
Views: 213
Reputation: 18960
The key to solve this is to use a branch reset group, that starts with (?|
and is itself a non-capturing group.
Each alternative inside the parenthesis uses the same numbers for its capturing group. This also works for named capture groups as long as the groups with the same name have the same index - or both be non-named groups.
However, to use this PCRE feature you have to use Python's alternative regex engine:
import regex as re
regex = r"(?|(?:(?P<month>\d{1,2})[\/-](?P<day>\d{1,2})[\/-](?P<year>\d{2,4}))|(?P<day>\d{1,2}) (?P<month>Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z.]*[,.]? (?P<year>\d{4}))"
PS: I have not inspected your patterns much but as hinted by others there is room for improvement. But that's another questions.
Upvotes: 1