sowji
sowji

Reputation: 103

How to extract a block of lines from given file using python

I have a file like this

grouping data-rate-parameters {
    description
      "Data rate configuration parameters.";
    reference
      "ITU-T G.997.2 clause 7.2.1.";

    leaf maximum-net-data-rate {
      type bbf-yang:data-rate32;
      default "4294967295";
      description
        "Defines the value of the maximum net data rate (see clause
         11.4.2.2/G.9701).";
      reference
        "ITU-T G.997.2 clause 7.2.1.1 (MAXNDR).";
    }

      leaf psd-level {
        type psd-level;
        description
          "The PSD level of the referenced sub-carrier.";
      }
    }
  }

  grouping line-spectrum-profile {
    description
      "Defines the parameters contained in a line spectrum
       profile.";

    leaf profiles {
      type union {
        type enumeration {
          enum "all" {
            description
              "Used to indicate that all profiles are allowed.";
          }
        }
        type profiles;
      }

Here I want to extract every leaf block. ex., leaf maximum-net-data-rate block is

leaf maximum-net-data-rate {
          type bbf-yang:data-rate32;
          default "4294967295";
          description
            "Defines the value of the maximum net data rate (see clause
             11.4.2.2/G.9701).";
          reference
            "ITU-T G.997.2 clause 7.2.1.1 (MAXNDR).";
        }

like this I want to extract

I tried with this code, here based on the counting of braces('{') i am trying to read the block

with open(r'file.txt','r') as f:
    leaf_part = []
    count = 0
    c = 'psd-level'
    for line in f:
        if 'leaf %s {'%c in line:
                    cur_line=line
                    for line in f:
                        pre_line=cur_line
                        cur_line=line
                        if '{' in pre_line:
                            leaf_part.append(pre_line)
                            count+=1
                        elif '}' in pre_line:
                            leaf_part.append(pre_line)
                            count-=1
                        elif count==0:
                            break
                        else:
                            leaf_part.append(pre_line)

Its worked for leaf maximum-net-data-rate but its not working for leaf psd-level

while doing for leaf psd-level, its displaying out of block lines also.

Help me to achieve this task.

Upvotes: 1

Views: 1084

Answers (2)

Oleksandr Muliar
Oleksandr Muliar

Reputation: 179

You can use regex:

import re
reg = re.compile(r"leaf.+?\{.+?\}", re.DOTALL)
reg.findall(file)

It returns an array of all matched blocks
If you want to search for specific leaf names, you can use format(remember to double curly brackets):

leafname = "maximum-net-data-rate"
reg = re.compile(r"leaf\s{0}.+?\{{.+?\}}".format(temp), re.DOTALL)

EDIT: for python 2.7

reg = re.compile(r"leaf\s%s.+?\{.+?\}" %temp, re.DOTALL)

EDIT2: totally missed that you have nested brackets in your last example.
This solution will be much more involved than a simple regex, so you might consider another approach. Still, it is possible to do.
First, you will need to install regex module, since built-in re does not support recursive patterns.

pip install regex

second, here is you pattern

import regex
reg = regex.compile(r"(leaf.*?)({(?>[^\{\}]|(?2))*})", regex.DOTALL)
reg.findall(file)

Now, this pattern will return a list of tuples, so you may want to do something like this

res = [el[0]+el[1] for el in reg.findall(file)]

This should give you the list of full results.

Upvotes: 0

Gahan
Gahan

Reputation: 4213

it just need simple edit in your break loop because of multiple closing bracket '}' your count is already been negative hence you need to change that line with

elif count<=0:
    break

but it is still appending multiple braces in your list so you can handle it by keeping record of opening bracket and I changed the code as below:

with open(r'file.txt','r') as f:
    leaf_part = []
    braces_record = []
    count = 0
    c = 'psd-level'
    for line in f:
        if 'leaf %s {'%c in line:
            braces_record.append('{')
            cur_line=line
            for line in f:
                pre_line=cur_line
                cur_line=line
                if '{' in pre_line:
                    braces_record.append('{')
                    leaf_part.append(pre_line)
                    count+=1
                elif '}' in pre_line:
                    try:
                        braces_record.pop()
                        if len(braces_record)>0:
                            leaf_part.append(pre_line)
                    except:
                        pass
                    count-=1
                elif count<=0:
                    break
                elif '}' not in pre_line:
                    leaf_part.append(pre_line)

Result of above code:

      leaf psd-level {
        type psd-level;
        description
          "The PSD level of the referenced sub-carrier.";
  }

Upvotes: 1

Related Questions