Reputation: 3
I've been working on a regex expression within Python to try to match on a certain sequence one time after the occurrence of a word. I've been running into two problems: 1) The text I'm trying to search has a variable amount of characters between the word I want to use as the trigger word and the word I actually want to match and 2) the text is multi-line.
In the following example text, I want to match "lag-10:10" and "lag-10:20" but not match "lag-10:30":
vprn 5001 name "5001" customer 1 create
interface "to-VPLS-6663000" create
sap lag-10:10 create
interface "to-VPLS-3000500" create
sap lag-10:20 create
vpls 3410001 name "XYZBDVLAN1" customer 1 create
sap lag-10:30 create
The result I am looking for is "'lag-10:10', 'lag-10:20'". I also can't just match on those terms alone because any numbers are expected after "lag-", and I only want to capture that group if it comes after the word "interface" (and not, in the example above, after the word "vpls" as in lag-10:30).
With the networking equipment I'm working with, the text between the word "interface" and "lag" can vary. The way I initially came up with to accomplish this is to match the sequence lag-10:[^ ]*
only one time and only after the occurrence of the word "interface". The problem is... I have no idea how to go about doing that. Everything I've tried captures either too much or not enough text, and is complicated by the fact that "lag" is on a different line than "interface".
Any help would be appreciated, as I am very new at Regex!
Upvotes: 0
Views: 969
Reputation: 959
A couple of different solutions here that should work as long as the values are always as you specified:
import re
text = """ vprn 5001 name "5001" customer 1 create
interface "to-VPLS-6663000" create
sap lag-10:10 create
interface "to-VPLS-3000500" create
sap lag-10:20 create
vpls 3410001 name "XYZBDVLAN1" customer 1 create
sap lag-10:30 create"""
pattern = r'lag-\d{2}:[1-2]0'
result = re.findall(pattern,text,re.MULTILINE)
print(result)
pattern = r'lag-\d{2}:[1-2]\d{1}'
result = re.findall(pattern,text,re.MULTILINE)
print(result)
Upvotes: 0
Reputation: 521289
Here is one approach using re.findall
in multiline mode:
inp = """ vprn 5001 name "5001" customer 1 create
interface "to-VPLS-6663000" create
sap lag-10:10 create
interface "to-VPLS-3000500" create
sap lag-10:20 create
vpls 3410001 name "XYZBDVLAN1" customer 1 create
sap lag-10:30 create"""
matches = re.findall(r'^\s+\binterface.*?\n\s+sap (lag-\d{1,2}:\d{2})', inp, flags=re.M)
print(matches) # ['lag-10:10', 'lag-10:20']
The regex pattern used above matches one line which starts with interface
as the first word, followed by a line containing sap-
and a hour:minute timestamp.
Upvotes: 1