Elle
Elle

Reputation: 15

Python Regex no groups after alternation operator

I wrote a regex match pattern in python, but re.match() do not capture groups after | alternation operator.

Here is the pattern:

pattern = r"00([1-9]\d) ([1-9]\d) ([1-9]\d{5})|\+([1-9]\d) ([1-9]\d) ([1-9]\d{5})"

I feed the pattern with a qualified string: "+12 34 567890":

strng = "+12 34 567890"
pattern = r"00([1-9]\d) ([1-9]\d) ([1-9]\d{5})|\+([1-9]\d) ([1-9]\d) ([1-9]\d{5})"
m = re.match(pattern, strng)
print(m.group(1))

None is printed.

Buf if I delete the part before | alternation operator

strng = "+12 34 567890"

pattern = r"\+([1-9]\d) ([1-9]\d) ([1-9]\d{5})"
m = re.match(pattern, strng)
print(m.group(1))

It can capture all 3 groups:

12
34
567890

Thanks so much for your thoughts!

Upvotes: 1

Views: 226

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626926

You want to support two different patterns, one with 00 and the other with + at the start. You may merge the alternatives using a non-capturing group:

import re
strng = "+12 34 567890"
pattern = r"(?:00|\+)([1-9]\d) ([1-9]\d) ([1-9]\d{5})$"
m = re.match(pattern, strng)
if m:
    print(m.group(1))
    print(m.group(2))
    print(m.group(3))

See the regex demo and the Python demo yielding

12
34
567890

The regex at the regex testing site is prepended with ^ (start of string) because re.match only matches at the start of the string. The whole pattern now matches:

  • ^ - start of string (implicit in re.match)
  • (?:00|\+) - a 00 or + substrings
  • ([1-9]\d) - Capturing group 1: a digit from 1 to 9 and then any digit
  • - a space (replace with \s to match any 1 whitespace chars)
  • ([1-9]\d) - Capturing group 2: a digit from 1 to 9 and then any digit
  • - a space (replace with \s to match any 1 whitespace chars)
  • ([1-9]\d{5}) - Capturing group 3: a digit from 1 to 9 and then any 5 digits
  • $ - end of string.

Remove $ if you do not need to match the end of the string right after the number.

Upvotes: 0

llllllllll
llllllllll

Reputation: 16404

'|' has nothing to do with the index of group, index is always counted from left to right in the regex itself.

In your original regex, their are 6 groups:

In [270]: m.groups()
Out[270]: (None, None, None, '12', '34', '567890')

The matching part is the second part, thus you need:

In [271]: m.group(4)
Out[271]: '12'

Upvotes: 1

Related Questions