Flynn Rhodes
Flynn Rhodes

Reputation: 15

How to parse a tree from a string in Python?

I have formatted all my university notes as follows:

CourseName: {
    Part 1: {
        I.I - Intro: {
            Topic1: {
                descr1;
                descr2: {
                    2.a;
                    2.b;
                    2.c.
                };
                descr3.
            };
            Topic2: {
                descr: {
                    example.
                }.
            }.
        };
        I.II - NextChapter: {
            Topic3: {
                whatever.
            }.
        }.
    };
    Part 2: {
        II.I - FinalChapter: {
            content.
        }.
    }.
}

I'd like to structure them into a Tree data structure and I've tried doing so, both recursively and iteratively, in the past hours, doing many researches online, but none of my attempts at it is working.

I've already implemented a Node class (with self.__value and a list self.__children and all the useful methods you would expect from it) as well as a Tree class (with self.__nodes as a dictionary and other utility methods), so feel free to use methods such as add_node or add_child in any form of your liking in your answers.

What I'm struggling with is to understand how to structure the function def parseTree(s, l) - that ideally takes as inputs a string s (my notes) and a list l establishing the delimiters i.e. [":{", ";", "}."] or ["{","}"] or similar - and returns a tree object, with each node having as value the text preceding :{ and a list of children (if any) separated by ; in the text.

Any suggestion?

Upvotes: 1

Views: 3827

Answers (2)

Ajax1234
Ajax1234

Reputation: 71451

Assuming your data is stored in a file, you can build a simple class to parse the structure into a dictionary. You can recursively traverse the data by creating a new Notes object for each key found:

file_data = filter(None, [i.strip('\n') for i in open('filename.txt')])
import re
class Notes:
   def __init__(self, token_data):
     self.token_data = token_data
     self.current_dict = {}
     self.current_vals = []
     self.parse()
   def parse(self):
     while True:
       start = next(self.token_data, None)
       if not start or "}" in start:
         break
       if start.endswith('{'):
          note = Notes(self.token_data)
          final_result = filter(lambda x:x, note.current_vals + [note.current_dict]) if note.current_vals else note.current_dict
          self.current_dict[re.findall('[\w\s\-\.]+', re.sub('^\s+', '', start))[0]] = final_result[0] if isinstance(final_result, list) and len(final_result) == 1 else final_result
          self.token_data = note.token_data
       else:
          self.current_vals.append(re.sub('^\s+', '', start))


course_notes = Notes(iter(file_data)).current_dict

Output:

{'CourseName': 
    {'Part 1': 
      {'I.I - Intro': 
         {'Topic1': ['descr1;',
                    'descr3.',
             {'descr2': ['2.a;',
                         '2.b;',
                          '2.c.']
                }
               ],
         'Topic2': {'descr': 'example.'}
                 },
                  'I.II - NextChapter': 
               {'Topic3': 'whatever.'}
             },
       'Part 2':{'II.I - FinalChapter': 'content.'}
     } 
   }

Upvotes: 2

Daniel Roseman
Daniel Roseman

Reputation: 599610

This is actually almost syntactically valid YAML. A simple substitution will make it valid:

data = data.replace(';', ',').replace('.', '')
parsed = yaml.load(data)

Upvotes: 4

Related Questions