Alok
Alok

Reputation: 31

Parse a file and create a data structure

We want to parse a file and create a data structure of some sort to be used later (in Python). The content of file looks like this:

plan HELLO
   feature A 
       measure X :
          src = "Type ,Name"
       endmeasure //X

       measure Y :
        src = "Type ,Name"
       endmeasure //Y

       feature Aa
           measure AaX :
              src = "Type ,Name"
                    "Type ,Name2"
                    "Type ,Name3"
           endmeasure //AaX

           measure AaY :
              src = "Type ,Name"
           endmeasure //AaY
           
           feature Aab
              .....
           endfeature // Aab
         
       endfeature //Aa
 
   endfeature // A
   
   feature B
     ......
   endfeature //B
endplan

plan HOLA
endplan //HOLA

So there's a file that contain one or more plans and then each plan contains one or more feature, further each feature contains a measure that contains info (src, type, name) and feature can further contain more features.

We need to parse through the file and create a data structure that would have

                     plan (HELLO) 
            ------------------------------
             ↓                          ↓ 
          Feature A                  Feature B
  ----------------------------          ↓
   ↓           ↓             ↓           ........
Measure X    Measure Y    Feature Aa
                         ------------------------------
                            ↓           ↓             ↓ 
                       Measure AaX   Measure AaY   Feature Aab
                                                        ↓
                                                        .......

I am trying to parse through the file line by line and create a list of lists that would contain plan -> feature -> measure, feature

def getplans(s):
    stack = [{}]
    stack_list = []
    
    for line in s.splitlines():
        if ": " in line:  # leaf
            temp_stack = {}
            key, value = line.split(": ", 1)
            key = key.replace("source","").replace("=","").replace("\"","").replace(";","")
            value = value.replace("\"","").replace(",","").replace(";","")
            temp_stack[key.strip()] = value.strip()
            stack_list.append(temp_stack)
            stack[-1]["MEASURED_VAL"] = stack_list
        elif line.strip()[:3] == "end":
            stack.pop()
            stack_list = []
        elif line.strip():
            collection, name, *_ = line.split()
            stack.append({})
            stack[-2].setdefault(collection, {})[name] = stack[-1] 
    return stack[0]

Upvotes: 1

Views: 111

Answers (3)

errorcode505
errorcode505

Reputation: 1

It seems like you are trying to parse the structure of this file and create a convenient data structure to represent the hierarchy of plans, features, and measures. Your current method uses a stack to track the nested structure, which is quite reasonable.

There are a few points to note:

Your attempt to remove characters such as "source," "=", ",", and ";" from keys and values looks somewhat unnecessary. If there's no specific reason for this, it might be a good idea to leave them in their original form to preserve data integrity.

It's important to ensure proper handling of the end of blocks (e.g., "endmeasure" and "endfeature"). Adding logic to pop elements from the stack when the end of a block is encountered will help maintain the correct nesting.

Here's an updated version of your code, taking these considerations into account:

def parse_file(s):
stack = []
data = {}

for line in s.splitlines():
    line = line.strip()
    
    if line.startswith("plan"):
        plan_name = line.split()[1]
        data[plan_name] = {}
        stack.append(data[plan_name])
    elif line.startswith("feature"):
        feature_name = line.split()[1]
        data[plan_name][feature_name] = {}
        stack.append(data[plan_name][feature_name])
    elif line.startswith("measure"):
        measure_name = line.split()[1]
        data[plan_name][feature_name][measure_name] = {}
        stack.append(data[plan_name][feature_name][measure_name])
    elif line.startswith("endmeasure") or line.startswith("endfeature"):
        stack.pop()
    elif line.startswith("endplan"):
        stack.pop()
        plan_name = None

return data

This code creates a data structure that reflects the plans, features, and measures in the input file. You can use this data structure for further data operations as needed.

Upvotes: 0

trincot
trincot

Reputation: 351369

I didn't get why you have replace calls for source, or ;, nor why you try to create a key MEASURED_VAL, but seeing your previous question, I would just extend the previous answer by making src a list, so that it can collect multiline data:

def getplans(s):
    stack = [{}]
    stack_list = None
    
    for line in s.splitlines():
        if "=" in line:  # leaf
            key, value = line.split("=", 1)
            stack_list = [value.strip(' "')]  # create list for multiple entries
            stack[-1][key.strip()] = stack_list
        elif line.strip()[:3] == "end":
            stack.pop()
            stack_list = None
        elif stack_list is not None:  # continuation of leaf data
            stack_list.append(line.strip(' "'))  # extend the list for `src`
        elif line.strip():
            collection, name, *_ = line.split()
            stack.append({})
            stack[-2].setdefault(collection, {})[name] = stack[-1] 
    return stack[0]

Upvotes: 0

Andrej Kesely
Andrej Kesely

Reputation: 195613

Looking at the file, I'd try to convert it the plan/feature/measure to tags and then parse it with HTML parser, for example beautifulsoup (or you can try the same with YAML and then use Yaml parser):

text = """\
plan HELLO
   feature A
       measure X :
          src = "Type ,Name"
       endmeasure //X

       measure Y :
        src = "Type ,Name"
       endmeasure //Y

       feature Aa
           measure AaX :
              src = "Type ,Name"
                    "Type ,Name2"
                    "Type ,Name3"
           endmeasure //AaX

           measure AaY :
              src = "Type ,Name"
                    "Type ,Name2"
                    "Type ,Name3"
           endmeasure //AaY

           feature Aab
              .....
           endfeature // Aab

       endfeature //Aa

   endfeature // A

   feature B
     ......
   endfeature //B
endplan

plan HOLA
endplan //HOLA"""

import re

from bs4 import BeautifulSoup

data = re.sub(r"\b(plan|feature|measure)\s+([^:\s]+).*", r'<\g<1> name="\g<2>">', text)
data = re.sub(r"\b(?:end)(plan|feature|measure).*", r"</\g<1>>", data)
data = re.sub(r'src\s*=\s*((?:"[^"]+"\s*)+)', r"<src>\g<1></src>", data)

soup = BeautifulSoup(data, "html.parser")

for m in soup.select("measure"):
    # find parent PLAN:
    print("Plan:", m.find_parent("plan")["name"])
    # find feature PLAN:
    print("Parent Feature:", m.find_parent("feature")["name"])
    print("Name:", m["name"])
    for line in m.text.splitlines():
        data = list(map(str.strip, line.strip(' "').split(",")))
        if len(data) == 2:
            print(data)

The converted text will be:

<plan name="HELLO">
   <feature name="A">
       <measure name="X">
          <src>"Type ,Name"
       </src></measure>
                                                    
       <measure name="Y">
        <src>"Type ,Name"
       </src></measure>
                                                    
       <feature name="Aa">
           <measure name="AaX">
              <src>"Type ,Name"                
                    "Type ,Name2"
                    "Type ,Name3"
           </src></measure>

           <measure name="AaY">
              <src>"Type ,Name"
                    "Type ,Name2"
                    "Type ,Name3"
           </src></measure>

           <feature name="Aab">
              .....
           </feature>

       </feature>

   </feature>

   <feature name="B">
     ......
   </feature>
</plan>

<plan name="HOLA">
</plan>

And output:

Plan: HELLO
Parent Feature: A
Name: X
['Type', 'Name']
Plan: HELLO
Parent Feature: A
Name: Y
['Type', 'Name']
Plan: HELLO
Parent Feature: Aa
Name: AaX
['Type', 'Name']
['Type', 'Name2']
['Type', 'Name3']
Plan: HELLO
Parent Feature: Aa
Name: AaY
['Type', 'Name']
['Type', 'Name2']
['Type', 'Name3']

Upvotes: 0

Related Questions