Snufkin
Snufkin

Reputation: 3

How to use regex to capture a word after another word, with newline in between

I've been working on a regex expression within Python to try to match on a certain sequence one time after the occurrence of a word. I've been running into two problems: 1) The text I'm trying to search has a variable amount of characters between the word I want to use as the trigger word and the word I actually want to match and 2) the text is multi-line.

In the following example text, I want to match "lag-10:10" and "lag-10:20" but not match "lag-10:30":

    vprn 5001 name "5001" customer 1 create
        interface "to-VPLS-6663000" create
            sap lag-10:10 create
        interface "to-VPLS-3000500" create
            sap lag-10:20 create
        vpls 3410001 name "XYZBDVLAN1" customer 1 create
            sap lag-10:30 create

The result I am looking for is "'lag-10:10', 'lag-10:20'". I also can't just match on those terms alone because any numbers are expected after "lag-", and I only want to capture that group if it comes after the word "interface" (and not, in the example above, after the word "vpls" as in lag-10:30).

With the networking equipment I'm working with, the text between the word "interface" and "lag" can vary. The way I initially came up with to accomplish this is to match the sequence lag-10:[^ ]* only one time and only after the occurrence of the word "interface". The problem is... I have no idea how to go about doing that. Everything I've tried captures either too much or not enough text, and is complicated by the fact that "lag" is on a different line than "interface".

Any help would be appreciated, as I am very new at Regex!

Upvotes: 0

Views: 969

Answers (2)

Robin Sage
Robin Sage

Reputation: 959

A couple of different solutions here that should work as long as the values are always as you specified:

import re
text = """    vprn 5001 name "5001" customer 1 create
        interface "to-VPLS-6663000" create
            sap lag-10:10 create
        interface "to-VPLS-3000500" create
            sap lag-10:20 create
        vpls 3410001 name "XYZBDVLAN1" customer 1 create
            sap lag-10:30 create"""

pattern = r'lag-\d{2}:[1-2]0'
result = re.findall(pattern,text,re.MULTILINE)
print(result)


pattern = r'lag-\d{2}:[1-2]\d{1}'
result = re.findall(pattern,text,re.MULTILINE)
print(result)

Upvotes: 0

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521289

Here is one approach using re.findall in multiline mode:

inp = """    vprn 5001 name "5001" customer 1 create
    interface "to-VPLS-6663000" create
        sap lag-10:10 create
    interface "to-VPLS-3000500" create
        sap lag-10:20 create
    vpls 3410001 name "XYZBDVLAN1" customer 1 create
        sap lag-10:30 create"""

matches = re.findall(r'^\s+\binterface.*?\n\s+sap (lag-\d{1,2}:\d{2})', inp, flags=re.M)
print(matches)  # ['lag-10:10', 'lag-10:20']

The regex pattern used above matches one line which starts with interface as the first word, followed by a line containing sap- and a hour:minute timestamp.

Upvotes: 1

Related Questions