Mohamed Ayoob
Mohamed Ayoob

Reputation: 25

Parse a custom text file in Python

I have a text to be parsed, this is a concise form of the text.

apple {
    type=fruit
    varieties {
        color=red
        origin=usa
    }
}

the output should be as shown below

apple.type=fruit
apple.varieties.color=red
apple.varieties.origin=usa

So far the only thing I have come up with is a sort of breadth-first approach in python. But I cant figure out how to get all the children within.

progInput = """apple {
    type=fruit
    varieties {
        color=red
        origin=usa
    }
}
"""
progInputSplitToLines = progInput.split('\n')
childrenList = []
root = ""

def hasChildren():
    if "{" in progInputSplitToLines[0]:
        global root
        root = progInputSplitToLines[0].split(" ")[0]
    for e in progInputSplitToLines[1:]:
        if "=" in e:
            childrenList.append({e.split("=")[0].replace("    ", ""),e.split("=")[1].replace("    ", "")})
hasChildren()

PS: I looked into tree structures in Python and came across anytree (https://anytree.readthedocs.io/en/latest/), do you think it would help in my case?

Would you please be able to help me out ? I'm not very good at parsing text. thanks a bunch in advance. :)

Upvotes: 0

Views: 2215

Answers (1)

RoadRunner
RoadRunner

Reputation: 26315

Since your file is in HOCON format, you can try using the pyhocon HOCON parser module to solve your problem.

Install: Either run pip install pyhocon, or download the github repo and perform a manual install with python setup.py install.

Basic usage:

from pyhocon import ConfigFactory

conf = ConfigFactory.parse_file('text.conf')

print(conf)

Which gives the following nested structure:

ConfigTree([('apple', ConfigTree([('type', 'fruit'), ('varieties', ConfigTree([('color', 'red'), ('origin', 'usa')]))]))])

ConfigTree is just a collections.OrderedDict(), as seen in the source code.

UPDATE:

To get your desired output, you can make your own recursive function to collect all paths:

from pyhocon import ConfigFactory
from pyhocon.config_tree import ConfigTree

def config_paths(config):
    for k, v in config.items():
        if isinstance(v, ConfigTree):
            for k1, v1 in config_paths(v):
                yield (k,) + k1, v1
        else:
            yield (k,), v

config = ConfigFactory.parse_file('text.conf')
for k, v in config_paths(config):
    print('%s=%s' % ('.'.join(k), v))

Which Outputs:

apple.type=fruit
apple.varieties.color=red
apple.varieties.origin=usa

Upvotes: 1

Related Questions