How to parse a tree from a string in Python?

Question

I have formatted all my university notes as follows:

CourseName: {
    Part 1: {
        I.I - Intro: {
            Topic1: {
                descr1;
                descr2: {
                    2.a;
                    2.b;
                    2.c.
                };
                descr3.
            };
            Topic2: {
                descr: {
                    example.
                }.
            }.
        };
        I.II - NextChapter: {
            Topic3: {
                whatever.
            }.
        }.
    };
    Part 2: {
        II.I - FinalChapter: {
            content.
        }.
    }.
}

I'd like to structure them into a Tree data structure and I've tried doing so, both recursively and iteratively, in the past hours, doing many researches online, but none of my attempts at it is working.

I've already implemented a Node class (with self.__value and a list self.__children and all the useful methods you would expect from it) as well as a Tree class (with self.__nodes as a dictionary and other utility methods), so feel free to use methods such as add_node or add_child in any form of your liking in your answers.

What I'm struggling with is to understand how to structure the function def parseTree(s, l) - that ideally takes as inputs a string s (my notes) and a list l establishing the delimiters i.e. [":{", ";", "}."] or ["{","}"] or similar - and returns a tree object, with each node having as value the text preceding :{ and a list of children (if any) separated by ; in the text.

Any suggestion?

Ajax1234 · Accepted Answer

Assuming your data is stored in a file, you can build a simple class to parse the structure into a dictionary. You can recursively traverse the data by creating a new Notes object for each key found:

file_data = filter(None, [i.strip('
') for i in open('filename.txt')])
import re
class Notes:
   def __init__(self, token_data):
     self.token_data = token_data
     self.current_dict = {}
     self.current_vals = []
     self.parse()
   def parse(self):
     while True:
       start = next(self.token_data, None)
       if not start or "}" in start:
         break
       if start.endswith('{'):
          note = Notes(self.token_data)
          final_result = filter(lambda x:x, note.current_vals + [note.current_dict]) if note.current_vals else note.current_dict
          self.current_dict[re.findall('[\w\s\-\.]+', re.sub('^\s+', '', start))[0]] = final_result[0] if isinstance(final_result, list) and len(final_result) == 1 else final_result
          self.token_data = note.token_data
       else:
          self.current_vals.append(re.sub('^\s+', '', start))


course_notes = Notes(iter(file_data)).current_dict

Output:

{'CourseName': 
    {'Part 1': 
      {'I.I - Intro': 
         {'Topic1': ['descr1;',
                    'descr3.',
             {'descr2': ['2.a;',
                         '2.b;',
                          '2.c.']
                }
               ],
         'Topic2': {'descr': 'example.'}
                 },
                  'I.II - NextChapter': 
               {'Topic3': 'whatever.'}
             },
       'Part 2':{'II.I - FinalChapter': 'content.'}
     } 
   }

How to parse a tree from a string in Python?

Answers (2)

Related Questions