tron_jones
tron_jones

Reputation: 219

Python re.match include optional group that is None

I am trying to use re.compile in a for line loop. After the re.compile running a re.match on the lines. Using Regex101 I get the correct matches on the groups but running inside python it returns None for a group that has an empty string. What I am after is matching on groups even if they are empty.

The string to match on:

Interface          Status      Protocol    Description
BE1                up          up          
Mg0/RSP0/CPU0/0    up          up          NNI to Cat2960x G1/0/1
Te0/0/0/3          admin-down  admin-down  
Gi0/0/1/0          down        down        Test L2VPN
RP/0/RSP0/CPU0:LAB-9001-1#

The last group (description) should be optional and can either contain a description or be empty. This works in Regex101 and I have 4 groups on this filter:

^\s*(?:(?P<interface>[a-zA-Z0-9]\S+?))\s+(?:(?P<status>[up|admin\-down]\S+?))\s+(?:(?P<protocol>[up|admin\-down]\S+))\s+(?:(?P<description>(?<!^).*))

On the code I am using compile and match but if the description is blank it returns None, when I want it to return the first 3 groups and an empty string for the 4th group (description).

for line in result.splitlines():
            line = line.rstrip()

            p1 = re.compile(r'^\s*(?:(?P<interface>[a-zA-Z0-9]\S+?))\s+(?:(?P<status>[up|admin\-down]\S+?))\s+(?:(?P<protocol>[up|admin\-down]\S+))\s+(?P<description>(?<!^).*)')
            m = p1.match(line).groups()
            print(m)

this will not match on anything that is blank for description. Is their a syntax to tell re.match to include empty groups?

Upvotes: 1

Views: 1457

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627119

The regex you use contains character classes instead of grouping constructs ([up|down] does not match up or down, it matches u, p, |, d, o, w or n) and the last pattern part must match an obligatory space+any chars, but your rstrip the line and there is no space left to match.

The fixed regex looks like

^(?P<interface>[a-zA-Z0-9]\S*)\s+(?P<status>up|admin-down)\s+(?P<protocol>up|admin-down)(?:\s+(?P<description>.*))?

See the Regulex graph:

enter image description here

Details

  • ^ - start of string
  • (?P<interface>[a-zA-Z0-9]\S*) - Group "interface": an alphanumeric followed with any 0+ non-whitespace chars
  • \s+ - 1+ whitespaces
  • (?P<status>up|admin-down) - Group "status": up or admin-down
  • \s+ - 1+ whitespaces
  • (?P<protocol>up|admin-down) - Group "protocol": up or admin-down
  • (?:\s+(?P<description>.*))? - an optional group:
    • \s+ - 1+ whitespaces
    • (?P<description>.*) - Group "description": any 0+ chars other than line break as many as possible

In Python, you may use

import re
result = r"""Interface          Status      Protocol    Description
BE1                up          up          
Mg0/RSP0/CPU0/0    up          up          NNI to Cat2960x G1/0/1
Te0/0/0/3          admin-down  admin-down  
Gi0/0/1/0          down        down        Test L2VPN
RP/0/RSP0/CPU0:LAB-9001-1#"""
p1 = re.compile(r'(?P<interface>[a-zA-Z0-9]\S*)\s+(?P<status>up|admin-down)\s+(?P<protocol>up|admin-down)(?:\s+(?P<description>.*))?')

for line in result.splitlines():
    line = line.rstrip()
    m = p1.match(line)
    if m:
        print(m.groups())

See the Python demo

Note the ^ start of string anchor is not necessary if you use re.match.

Upvotes: 1

Related Questions