Reputation: 15
I wrote a regex match pattern in python, but re.match() do not capture groups after | alternation operator.
Here is the pattern:
pattern = r"00([1-9]\d) ([1-9]\d) ([1-9]\d{5})|\+([1-9]\d) ([1-9]\d) ([1-9]\d{5})"
I feed the pattern with a qualified string: "+12 34 567890"
:
strng = "+12 34 567890"
pattern = r"00([1-9]\d) ([1-9]\d) ([1-9]\d{5})|\+([1-9]\d) ([1-9]\d) ([1-9]\d{5})"
m = re.match(pattern, strng)
print(m.group(1))
None is printed.
Buf if I delete the part before | alternation operator
strng = "+12 34 567890"
pattern = r"\+([1-9]\d) ([1-9]\d) ([1-9]\d{5})"
m = re.match(pattern, strng)
print(m.group(1))
It can capture all 3 groups:
12
34
567890
Thanks so much for your thoughts!
Upvotes: 1
Views: 226
Reputation: 626926
You want to support two different patterns, one with 00
and the other with +
at the start. You may merge the alternatives using a non-capturing group:
import re
strng = "+12 34 567890"
pattern = r"(?:00|\+)([1-9]\d) ([1-9]\d) ([1-9]\d{5})$"
m = re.match(pattern, strng)
if m:
print(m.group(1))
print(m.group(2))
print(m.group(3))
See the regex demo and the Python demo yielding
12
34
567890
The regex at the regex testing site is prepended with ^
(start of string) because re.match
only matches at the start of the string. The whole pattern now matches:
^
- start of string (implicit in re.match
)(?:00|\+)
- a 00
or +
substrings([1-9]\d)
- Capturing group 1: a digit from 1
to 9
and then any digit
- a space (replace with \s
to match any 1 whitespace chars) ([1-9]\d)
- Capturing group 2: a digit from 1
to 9
and then any digit
- a space (replace with \s
to match any 1 whitespace chars) ([1-9]\d{5})
- Capturing group 3: a digit from 1
to 9
and then any 5 digits$
- end of string.Remove $
if you do not need to match the end of the string right after the number.
Upvotes: 0
Reputation: 16404
'|'
has nothing to do with the index of group, index is always counted from left to right in the regex itself.
In your original regex, their are 6 groups:
In [270]: m.groups()
Out[270]: (None, None, None, '12', '34', '567890')
The matching part is the second part, thus you need:
In [271]: m.group(4)
Out[271]: '12'
Upvotes: 1