xeor
xeor

Reputation: 5455

Parsing space indented data

I have some data that looks like YAML, but aint. Here is an example;

An instance of A
  objectID=123
  family=abc

An instance of A
  objectID=234
  family=bcd
  List of 4 X elements:
    An instance of X:
      objectID=222
      name=ccc
    An instance of X:
      objectID=333

And so on...

I need to find a way to make it looks more like this:

[
  {'name': 'An instance of A',
   'data': [
     {'objectID': 123,
      'family': 'abc'
     }
   ]
 },
 ...

I have tried to create some recursive function to parse this, but it ends up being a mess.

I'm not asking for a complete working example, but what is the best way to do this in python? Self calling function? Using another lib (which I haven't found yet)? Using another language to help me and embed the whole thing in python?

Upvotes: 1

Views: 145

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1121486

Use a stack, and push and pop items from it as you find more or fewer levels of indentation; each level on the stack holds the indentation depth and the entry:

stack = [(0, {})]  # indentation level, top-level entry
entry = stack[-1][1]

for line in input:
    line = line.strip()
    if not line: continue

    indentation = len(input) - len(input.lstrip())
    if indentation > stack[-1][0]:  # indented further? New entry
        entry = stack[-1][1]['data'] = {}
        stack.append((indentation, entry)) # push
    else:
        while indentation < stack[-1][0]:  # indentation dropped
            del stack[-1]       # pop
            entry = stack[-1][1]

    # process line and add to entry

result = stack[0][1]

Upvotes: 3

Related Questions