Python re.match include optional group that is None

Question

I am trying to use re.compile in a for line loop. After the re.compile running a re.match on the lines. Using Regex101 I get the correct matches on the groups but running inside python it returns None for a group that has an empty string. What I am after is matching on groups even if they are empty.

The string to match on:

Interface          Status      Protocol    Description
BE1                up          up          
Mg0/RSP0/CPU0/0    up          up          NNI to Cat2960x G1/0/1
Te0/0/0/3          admin-down  admin-down  
Gi0/0/1/0          down        down        Test L2VPN
RP/0/RSP0/CPU0:LAB-9001-1#

The last group (description) should be optional and can either contain a description or be empty. This works in Regex101 and I have 4 groups on this filter:

^\s*(?:(?P[a-zA-Z0-9]\S+?))\s+(?:(?P[up|admin\-down]\S+?))\s+(?:(?P[up|admin\-down]\S+))\s+(?:(?P(?



On the code I am using compile and match but if the description is blank it returns None, when I want it to return the first 3 groups and an empty string for  the 4th group (description).        

for line in result.splitlines():
            line = line.rstrip()

            p1 = re.compile(r'^\s*(?:(?P[a-zA-Z0-9]\S+?))\s+(?:(?P[up|admin\-down]\S+?))\s+(?:(?P[up|admin\-down]\S+))\s+(?P(?


this will not match on anything that is blank for description. Is their a syntax to tell re.match to include empty groups?

Wiktor Stribiżew · Accepted Answer

The regex you use contains character classes instead of grouping constructs ([up|down] does not match up or down, it matches u, p, |, d, o, w or n) and the last pattern part must match an obligatory space+any chars, but your rstrip the line and there is no space left to match.

The fixed regex looks like

^(?P[a-zA-Z0-9]\S*)\s+(?Pup|admin-down)\s+(?Pup|admin-down)(?:\s+(?P.*))?

See the Regulex graph:

Details

^ - start of string
(?P[a-zA-Z0-9]\S*) - Group "interface": an alphanumeric followed with any 0+ non-whitespace chars
\s+ - 1+ whitespaces
(?Pup|admin-down) - Group "status": up or admin-down
\s+ - 1+ whitespaces
(?Pup|admin-down) - Group "protocol": up or admin-down
(?:\s+(?P.*))? - an optional group:
- \s+ - 1+ whitespaces
- (?P.*) - Group "description": any 0+ chars other than line break as many as possible

In Python, you may use

import re
result = r"""Interface          Status      Protocol    Description
BE1                up          up          
Mg0/RSP0/CPU0/0    up          up          NNI to Cat2960x G1/0/1
Te0/0/0/3          admin-down  admin-down  
Gi0/0/1/0          down        down        Test L2VPN
RP/0/RSP0/CPU0:LAB-9001-1#"""
p1 = re.compile(r'(?P[a-zA-Z0-9]\S*)\s+(?Pup|admin-down)\s+(?Pup|admin-down)(?:\s+(?P.*))?')

for line in result.splitlines():
    line = line.rstrip()
    m = p1.match(line)
    if m:
        print(m.groups())

See the Python demo

Note the ^ start of string anchor is not necessary if you use re.match.

Python re.match include optional group that is None

Answers (1)

Related Questions