Reputation: 5078
Can regex return matches and extended matches. What I mean is one regex expression that can return different number of found elements depending on the structure. My text is:
AB : CDE / 123.456.1; 1
AC : DEF / 3.1.2
My return (match) should be:
'AB', 'CDE', '123.456.1', '1'
'AC', 'DEF','3.1.2'
So if there is a value after a semicolon then the regex should match and return that as well. But if is not there it should still match the part and return the rest.
My code is:
import re
s = '''AB : CDE / 123.456.1; 1
AC : DEF / 3.1.2'''
match1 = re.search(r'((?:AB|AC))\s*:\s*(\w+)\s*\/\s*([\w.]+)\s*(;\s*\d+)', s)
print(match1[0])
match2 = re.search(r'((?:AB|AC))\s*:\s*(\w+)\s*\/\s*([\w.]+)\s*', s)
print(match2[0])
Where match1 only matches the first occurrance and match2 only the second. What would be the regex to work in both cases?
Upvotes: 1
Views: 112
Reputation: 626845
The r'((?:AB|AC))\s*:\s*(\w+)\s*\/\s*([\w.]+)\s*(;\s*\d+)'
pattern contains an obligatory (;\s*\d+)
pattern at the end. You need to make it optional and you may do it by adding a ?
quantifier after it, so as to match 1 or 0 occurrences of the subpattern.
With other minor enhancements, you may use
r'A[BC]\s*:\s*\w+\s*/\s*[\w.]+\s*(?:;\s*\d+)?'
Note all capturing groups are removed, and non-capturing ones are introduced since you only get the whole match value in the end.
Details
A[BC]
- AB
or AC
\s*:\s*
- a colon enclosed with 0+ whitespace chars\w+
- or more word chars\s*/\s*
- a /
enclosed with 0+ whitespace chars[\w.]+
- 1 or more word or .
chars\s*
- 0+ whitespaces(?:;\s*\d+)?
- an optional sequence of
;
- a ;
\s*
- 0+ whitespaces\d+
- 1+ digitsUpvotes: 3