Reputation: 15
I have formatted all my university notes as follows:
CourseName: {
Part 1: {
I.I - Intro: {
Topic1: {
descr1;
descr2: {
2.a;
2.b;
2.c.
};
descr3.
};
Topic2: {
descr: {
example.
}.
}.
};
I.II - NextChapter: {
Topic3: {
whatever.
}.
}.
};
Part 2: {
II.I - FinalChapter: {
content.
}.
}.
}
I'd like to structure them into a Tree data structure and I've tried doing so, both recursively and iteratively, in the past hours, doing many researches online, but none of my attempts at it is working.
I've already implemented a Node class (with self.__value
and a list self.__children
and all the useful methods you would expect from it) as well as a Tree class (with self.__nodes
as a dictionary and other utility methods), so feel free to use methods such as add_node
or add_child
in any form of your liking in your answers.
What I'm struggling with is to understand how to structure the function def parseTree(s, l)
- that ideally takes as inputs a string s
(my notes) and a list l
establishing the delimiters i.e. [":{", ";", "}."]
or ["{","}"]
or similar - and returns a tree object, with each node having as value the text preceding :{
and a list of children (if any) separated by ;
in the text.
Any suggestion?
Upvotes: 1
Views: 3827
Reputation: 71451
Assuming your data is stored in a file, you can build a simple class to parse the structure into a dictionary. You can recursively traverse the data by creating a new Notes
object for each key found:
file_data = filter(None, [i.strip('\n') for i in open('filename.txt')])
import re
class Notes:
def __init__(self, token_data):
self.token_data = token_data
self.current_dict = {}
self.current_vals = []
self.parse()
def parse(self):
while True:
start = next(self.token_data, None)
if not start or "}" in start:
break
if start.endswith('{'):
note = Notes(self.token_data)
final_result = filter(lambda x:x, note.current_vals + [note.current_dict]) if note.current_vals else note.current_dict
self.current_dict[re.findall('[\w\s\-\.]+', re.sub('^\s+', '', start))[0]] = final_result[0] if isinstance(final_result, list) and len(final_result) == 1 else final_result
self.token_data = note.token_data
else:
self.current_vals.append(re.sub('^\s+', '', start))
course_notes = Notes(iter(file_data)).current_dict
Output:
{'CourseName':
{'Part 1':
{'I.I - Intro':
{'Topic1': ['descr1;',
'descr3.',
{'descr2': ['2.a;',
'2.b;',
'2.c.']
}
],
'Topic2': {'descr': 'example.'}
},
'I.II - NextChapter':
{'Topic3': 'whatever.'}
},
'Part 2':{'II.I - FinalChapter': 'content.'}
}
}
Upvotes: 2
Reputation: 599610
This is actually almost syntactically valid YAML. A simple substitution will make it valid:
data = data.replace(';', ',').replace('.', '')
parsed = yaml.load(data)
Upvotes: 4