Reputation: 31
We want to parse a file and create a data structure of some sort to be used later (in Python). The content of file looks like this:
plan HELLO
feature A
measure X :
src = "Type ,Name"
endmeasure //X
measure Y :
src = "Type ,Name"
endmeasure //Y
feature Aa
measure AaX :
src = "Type ,Name"
"Type ,Name2"
"Type ,Name3"
endmeasure //AaX
measure AaY :
src = "Type ,Name"
endmeasure //AaY
feature Aab
.....
endfeature // Aab
endfeature //Aa
endfeature // A
feature B
......
endfeature //B
endplan
plan HOLA
endplan //HOLA
So there's a file that contain one or more plans and then each plan contains one or more feature, further each feature contains a measure that contains info (src, type, name) and feature can further contain more features.
We need to parse through the file and create a data structure that would have
plan (HELLO)
------------------------------
↓ ↓
Feature A Feature B
---------------------------- ↓
↓ ↓ ↓ ........
Measure X Measure Y Feature Aa
------------------------------
↓ ↓ ↓
Measure AaX Measure AaY Feature Aab
↓
.......
I am trying to parse through the file line by line and create a list of lists that would contain plan -> feature -> measure, feature
def getplans(s):
stack = [{}]
stack_list = []
for line in s.splitlines():
if ": " in line: # leaf
temp_stack = {}
key, value = line.split(": ", 1)
key = key.replace("source","").replace("=","").replace("\"","").replace(";","")
value = value.replace("\"","").replace(",","").replace(";","")
temp_stack[key.strip()] = value.strip()
stack_list.append(temp_stack)
stack[-1]["MEASURED_VAL"] = stack_list
elif line.strip()[:3] == "end":
stack.pop()
stack_list = []
elif line.strip():
collection, name, *_ = line.split()
stack.append({})
stack[-2].setdefault(collection, {})[name] = stack[-1]
return stack[0]
Upvotes: 1
Views: 111
Reputation: 1
It seems like you are trying to parse the structure of this file and create a convenient data structure to represent the hierarchy of plans, features, and measures. Your current method uses a stack to track the nested structure, which is quite reasonable.
There are a few points to note:
Your attempt to remove characters such as "source," "=", ",", and ";" from keys and values looks somewhat unnecessary. If there's no specific reason for this, it might be a good idea to leave them in their original form to preserve data integrity.
It's important to ensure proper handling of the end of blocks (e.g., "endmeasure" and "endfeature"). Adding logic to pop elements from the stack when the end of a block is encountered will help maintain the correct nesting.
Here's an updated version of your code, taking these considerations into account:
def parse_file(s):
stack = []
data = {}
for line in s.splitlines():
line = line.strip()
if line.startswith("plan"):
plan_name = line.split()[1]
data[plan_name] = {}
stack.append(data[plan_name])
elif line.startswith("feature"):
feature_name = line.split()[1]
data[plan_name][feature_name] = {}
stack.append(data[plan_name][feature_name])
elif line.startswith("measure"):
measure_name = line.split()[1]
data[plan_name][feature_name][measure_name] = {}
stack.append(data[plan_name][feature_name][measure_name])
elif line.startswith("endmeasure") or line.startswith("endfeature"):
stack.pop()
elif line.startswith("endplan"):
stack.pop()
plan_name = None
return data
This code creates a data structure that reflects the plans, features, and measures in the input file. You can use this data structure for further data operations as needed.
Upvotes: 0
Reputation: 351369
I didn't get why you have replace
calls for source
, or ;
, nor why you try to create a key MEASURED_VAL
, but seeing your previous question, I would just extend the previous answer by making src
a list, so that it can collect multiline data:
def getplans(s):
stack = [{}]
stack_list = None
for line in s.splitlines():
if "=" in line: # leaf
key, value = line.split("=", 1)
stack_list = [value.strip(' "')] # create list for multiple entries
stack[-1][key.strip()] = stack_list
elif line.strip()[:3] == "end":
stack.pop()
stack_list = None
elif stack_list is not None: # continuation of leaf data
stack_list.append(line.strip(' "')) # extend the list for `src`
elif line.strip():
collection, name, *_ = line.split()
stack.append({})
stack[-2].setdefault(collection, {})[name] = stack[-1]
return stack[0]
Upvotes: 0
Reputation: 195613
Looking at the file, I'd try to convert it the plan
/feature
/measure
to tags and then parse it with HTML parser, for example beautifulsoup
(or you can try the same with YAML and then use Yaml parser):
text = """\
plan HELLO
feature A
measure X :
src = "Type ,Name"
endmeasure //X
measure Y :
src = "Type ,Name"
endmeasure //Y
feature Aa
measure AaX :
src = "Type ,Name"
"Type ,Name2"
"Type ,Name3"
endmeasure //AaX
measure AaY :
src = "Type ,Name"
"Type ,Name2"
"Type ,Name3"
endmeasure //AaY
feature Aab
.....
endfeature // Aab
endfeature //Aa
endfeature // A
feature B
......
endfeature //B
endplan
plan HOLA
endplan //HOLA"""
import re
from bs4 import BeautifulSoup
data = re.sub(r"\b(plan|feature|measure)\s+([^:\s]+).*", r'<\g<1> name="\g<2>">', text)
data = re.sub(r"\b(?:end)(plan|feature|measure).*", r"</\g<1>>", data)
data = re.sub(r'src\s*=\s*((?:"[^"]+"\s*)+)', r"<src>\g<1></src>", data)
soup = BeautifulSoup(data, "html.parser")
for m in soup.select("measure"):
# find parent PLAN:
print("Plan:", m.find_parent("plan")["name"])
# find feature PLAN:
print("Parent Feature:", m.find_parent("feature")["name"])
print("Name:", m["name"])
for line in m.text.splitlines():
data = list(map(str.strip, line.strip(' "').split(",")))
if len(data) == 2:
print(data)
The converted text will be:
<plan name="HELLO">
<feature name="A">
<measure name="X">
<src>"Type ,Name"
</src></measure>
<measure name="Y">
<src>"Type ,Name"
</src></measure>
<feature name="Aa">
<measure name="AaX">
<src>"Type ,Name"
"Type ,Name2"
"Type ,Name3"
</src></measure>
<measure name="AaY">
<src>"Type ,Name"
"Type ,Name2"
"Type ,Name3"
</src></measure>
<feature name="Aab">
.....
</feature>
</feature>
</feature>
<feature name="B">
......
</feature>
</plan>
<plan name="HOLA">
</plan>
And output:
Plan: HELLO
Parent Feature: A
Name: X
['Type', 'Name']
Plan: HELLO
Parent Feature: A
Name: Y
['Type', 'Name']
Plan: HELLO
Parent Feature: Aa
Name: AaX
['Type', 'Name']
['Type', 'Name2']
['Type', 'Name3']
Plan: HELLO
Parent Feature: Aa
Name: AaY
['Type', 'Name']
['Type', 'Name2']
['Type', 'Name3']
Upvotes: 0