mrCarnivore
mrCarnivore

Reputation: 5078

Regex return match and extended matches

Can regex return matches and extended matches. What I mean is one regex expression that can return different number of found elements depending on the structure. My text is:

AB : CDE / 123.456.1; 1
AC : DEF / 3.1.2

My return (match) should be:

'AB', 'CDE', '123.456.1', '1'
'AC', 'DEF','3.1.2'

So if there is a value after a semicolon then the regex should match and return that as well. But if is not there it should still match the part and return the rest.

My code is:

import re

s = '''AB : CDE / 123.456.1; 1
AC : DEF / 3.1.2'''

match1 = re.search(r'((?:AB|AC))\s*:\s*(\w+)\s*\/\s*([\w.]+)\s*(;\s*\d+)', s)
print(match1[0])

match2 = re.search(r'((?:AB|AC))\s*:\s*(\w+)\s*\/\s*([\w.]+)\s*', s)
print(match2[0])

Where match1 only matches the first occurrance and match2 only the second. What would be the regex to work in both cases?

Upvotes: 1

Views: 112

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626845

The r'((?:AB|AC))\s*:\s*(\w+)\s*\/\s*([\w.]+)\s*(;\s*\d+)' pattern contains an obligatory (;\s*\d+) pattern at the end. You need to make it optional and you may do it by adding a ? quantifier after it, so as to match 1 or 0 occurrences of the subpattern.

With other minor enhancements, you may use

r'A[BC]\s*:\s*\w+\s*/\s*[\w.]+\s*(?:;\s*\d+)?'

Note all capturing groups are removed, and non-capturing ones are introduced since you only get the whole match value in the end.

Details

  • A[BC] - AB or AC
    • \s*:\s* - a colon enclosed with 0+ whitespace chars
  • \w+ - or more word chars
  • \s*/\s* - a / enclosed with 0+ whitespace chars
  • [\w.]+ - 1 or more word or . chars
  • \s* - 0+ whitespaces
  • (?:;\s*\d+)? - an optional sequence of
    • ; - a ;
    • \s* - 0+ whitespaces
    • \d+ - 1+ digits

Upvotes: 3

Related Questions