Yogesh
Yogesh

Reputation: 26

A regex in python for matching multiple lines of certain pattern

Hi I am trying to build a multiline regex to group a line followed by lines beginning with at least one white space. For example

interface Ethernet 1/1

      ip address <>
      mtu <>

ip tcp path-mtu-discovery

router bgp 100

     network 1.1.1.0

How to build a regex that would group "interface ethertnet 1/1" and its subconfig into one group, and "ip tcp path-mtu-discovery" into another group and the bgp and it's subcommands into another group. In other words, a line beginning with non-whitespace character should get grouped with, if it is followed by, lines beginning with whitespaces. Two lines beginning with non-whitespace character should be two different groups.

I tried some of the regex already discussed, but that doesn't help.

Thanks in advance

Upvotes: 0

Views: 584

Answers (1)

falsetru
falsetru

Reputation: 368944

>>> lines = '''interface Ethernet 1/1
...
...       ip address <>
...       mtu <>
...
... ip tcp path-mtu-discovery
...
... router bgp 100
...
...      network 1.1.1.0
... '''
>>> for x in re.findall(r'^\S.*(?:\n(?:[ \t].*|$))*', lines, flags=re.MULTILINE):
...     print(repr(x))
...
'interface Ethernet 1/1\n\n      ip address <>\n      mtu <>\n'
'ip tcp path-mtu-discovery\n'
'router bgp 100\n\n     network 1.1.1.0\n'
  • ^\S.+: matches lines that start with non-space character.
  • \n[ \t].*: matches lines start with space character.
  • \n$: matches empty line
  • \n(?:[ \t].*|$): matches lines start with space or (|), empty line

Using itertools.groupby:

lines = '''interface Ethernet 1/1

      ip address <>
      mtu <>

ip tcp path-mtu-discovery

router bgp 100

     network 1.1.1.0
'''

class LineState:
    def __init__(self):
        self.state = 0
    def __call__(self, line):
        # According to the return value of this
        # method, lines are grouped; lines of same values are
        # grouped together.
        if line and not line[0].isspace():
            # Change state on new config section
            self.state += 1
        return self.state

import itertools
for _, group in itertools.groupby(lines.splitlines(), key=LineState()):
    print(list(group))

prints:

['interface Ethernet 1/1', '', '      ip address <>', '      mtu <>', '']
['ip tcp path-mtu-discovery', '']
['router bgp 100', '', '     network 1.1.1.0']

Upvotes: 1

Related Questions