Jegor
Jegor

Reputation: 25

Python: regex parse text to create dict

I have a problem with one task:

I have output from cisco hw.

IP access list 100  
        10 permit igmp any any  
        20 deny any any  
IP access list 200  
        10 permit ip 192.168.1.1/32   
        20 permit ip 192.168.2.1/32 any  
        30 permit ip 192.168.3.3/32 any  
        40 deny any any

The task is to make a dict with access list number as key and access list rule number as value.

acl_dict = {'100' : '10', '100' : '20','200': '10', '200': '20', '200': '30', '200': '40'}

I have written a regex:

rx = re.compile("""
                   list\s(.*)[\n\r]
                   \s{4}(\d{1,3}).+$
                 """,re.MULTILINE|re.VERBOSE)
         for match in rx.finditer(text):
             print (match.group(1))
             print (match.group(2))

But is shows only number from first two strings (100 and 10) I need to modify somehow regex to match all numbers to make needed dict. Can anyone help ?

Upvotes: 0

Views: 320

Answers (2)

Alberto Re
Alberto Re

Reputation: 514

It's possible to do it with a single method by using the newest regex module:

import regex

text = """
IP access list 100  
    10 permit igmp any any  
    20 deny any any  
IP access list 200  
    10 permit ip 192.168.1.1/32   
    20 permit ip 192.168.2.1/32 any  
    30 permit ip 192.168.3.3/32 any  
    40 deny any any
"""

acl_dict = {}
rx = regex.compile("list\s(.+)[\n\r](\s{4}(\d{1,3}).+[\n\r])*", regex.MULTILINE|regex.VERBOSE)
for match in rx.finditer(text):
    acl_dict[match.group(1)] = match.captures(3)

print(acl_dict)

Output:

$ python3 match.py 
{'200  ': ['10', '20', '30', '40'], '100  ': ['10', '20']}

Upvotes: 2

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626758

You may extract full blocks first, and then get the leading numbers from the inner parts (that can be captured).

Use

r'(?sm)IP access list\s+(\d+)(.*?)(?=^IP access list|\Z)'

See the regex demo.

Details:

  • (?sm) - enable the DOTALL and MULTILINE modes
  • IP access list - a literal string IP access list (can be prepended with ^ if it is always at the line start)
  • \s+ - 1 or more whitespaces
  • (\d+) - Group 1: one or more digits
  • (.*?) - Group 2: any 0+ chars as few as possible up to the first...
  • (?=^IP access list|\Z) - IP access list at the start of a line or end of string (\Z).

Python sample code:

import re
input_str = "IP access list 100  \n        10 permit igmp any any  \n        20 deny any any  \nIP access list 200  \n        10 permit ip 192.168.1.1/32   \n        20 permit ip 192.168.2.1/32 any  \n        30 permit ip 192.168.3.3/32 any  \n        40 deny any any"
results = {}
for match in re.finditer(r"(?sm)IP access list\s+(\d+)(.*?)(?=^IP access list|\Z)", input_str):
    fields = re.findall(r"(?m)^\s*(\d+)", match.group(2))
    results[match.group(1)] = fields
print(results) # => {'200': ['10', '20', '30', '40'], '100': ['10', '20']}

Upvotes: 1

Related Questions