user2322491
user2322491

Reputation: 109

Python parse complex command output

Need to parse output of a command in python. The command returns something like this

A:
        2 bs found
        3 cs found
B:
        1 a found
        3 bs found
C:
        1 c found
        D:
                2 es found
                3 fs found

Need to able to do the following with the output:

access a.bs found b.a found. c.d.es found and so on.

How do I do this python? What data structure is best suited to do this?

The goal of this exercise is to run the command every 10 secs and identify a diff of what's changed

Upvotes: 0

Views: 1529

Answers (2)

kampu
kampu

Reputation: 1421

An alternative solution is to translate the input string directly into something that a pre-existing library can read. This particular data looks like a good fit for YAML.

In this case you would re.sub('( +)([1-9]+) ([a-z]).+', '\\1\\3 : \\2', allcontent), which rewrites the '2 cs found' type lines into a key:value mapping that pyYAML understands. To be precise, the form '2 cs found' becomes 'c : 2'

the result?

A:
        b : 2
        c : 3
B:
        a : 1
        b : 3
C:
        c : 1
        D:
                e : 2
                f : 3

executing yaml.load(newcontent) returns the following python data structure:

{'A': {'b': 2, 'c': 3},
 'B': {'a': 1, 'b': 3},
 'C': {'D': {'e': 2, 'f': 3}, 'c': 1}}

Which matches my suggestion in my earlier comment. If you prefer json (Python comes with a json module), it's pretty simple to adapt this strategy to produce JSON instead.

Upvotes: 2

kampu
kampu

Reputation: 1421

This should have a 'parsing' tag as it's a general parsing problem.

The normal solution in this kind of situation is to track a) the indentation and b) the list of structures that are currently being parsed, as you read in lines. b would begin as a list containing a single empty dict, ie. curparsing = [{}]

Loop over all input lines. For example:

with open('inputfilename','r') as f:
    for line in f:
        # code implementing the below rules.
  • if a line is blank (if not line.strip():), ignore it and go onto the next one (continue)

  • if the indentation level has decreased, we should remove the top item in the currently-parsing list (ie. curparsing.pop()). if multiple decreases are detected, we should remove multiple items from the top.

  • strip off any leading indentation with line=line.lstrip()

  • if ':' is in the line, then we've found a sub-dictionary. Read the key(the part to the left of ':'), increase the indent-level, create a new dictionary, and insert it into the dictionary at the current top of the list. Then append our newly-created dictionary to the list.

  • if line[0] in '123456789': then we found a report of '[count] [character]s found'. we can use regular expressions to find the count and the character, with m = re.match('([1-9]+) ([a-z])'); count, character = m.groups(); count = int(count). We then store this into the dictionary at the current top of the list: curparsing[-1][character] = count

That's pretty much it. You just loop over lines and apply these rules to each line, and at the end, curparsing[0] contains the parsed document.

Upvotes: 0

Related Questions