Matthew May
Matthew May

Reputation: 113

Combine regexes which have same named groups

I try to combine several regexes to match date.

for example, I have

regex1: (3 groups inside, which are 'month', 'day', 'year')

(?:(?P<month>\d{1,2})[/-](?P<day>\d{1,2})[/-](?P<year>\d{2,4}))

regex2: (also 3 groups inside)

(?P<day>\d{1,2}) (?P<month>Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z.]*[,.]? (?P<year>\d{4})

and a lot other regexes.

I have tested them one by one. Now I want to combine them into one regex with a lot of "|"s between them.

I have tried:

import re
regexes = re.compile('regex1 here |'
                     'regex2 here |'
                     'regex3 here')

but it return error like:

redefinition of group name 'month' as group 4; was group 1 at position 59

My guess is that the group with the same name can only occur once ?

So, how can I combine all these regexes into one with named groups?

Upvotes: 1

Views: 213

Answers (1)

wp78de
wp78de

Reputation: 18960

The key to solve this is to use a branch reset group, that starts with (?| and is itself a non-capturing group.

Each alternative inside the parenthesis uses the same numbers for its capturing group. This also works for named capture groups as long as the groups with the same name have the same index - or both be non-named groups.

However, to use this PCRE feature you have to use Python's alternative regex engine:

import regex as re
regex = r"(?|(?:(?P<month>\d{1,2})[\/-](?P<day>\d{1,2})[\/-](?P<year>\d{2,4}))|(?P<day>\d{1,2}) (?P<month>Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z.]*[,.]? (?P<year>\d{4}))"

PS: I have not inspected your patterns much but as hinted by others there is room for improvement. But that's another questions.

Upvotes: 1

Related Questions