return 0
return 0

Reputation: 4376

Parsing key-value log in Python

I have a system log that looks like the following:

{
    a = 1
    b = 2
    c = [
            x:1,
            y:2,
            z:3,
        ]
    d = 4
}

I want to parse this in Python into a dictionary object with = splitting key-value pairs. At the same time, the array that is enclosed by [] is also preserved. I want to keep this as generic as possible so the parsing can also hold some future variations.

What I tried so far (code will be written): split each line by "=" into key-value pair, determine where [ and ] starts and end and then split the lines in between by ":" into key-value pairs. That seems a little hard-coded.. Any better idea?

Upvotes: 0

Views: 644

Answers (2)

Luke Taylor
Luke Taylor

Reputation: 9580

This could be pretty easily simplified to YAML. pip install pyyaml, then set up like so:

import string, yaml

data = """
{
    a = 1
    b = 2
    c = [
            x:1,
            y:2,
            z:3,
        ]
    d = 4
}
"""

With this setup, you can use the following to parse your data:

data2 = data.replace(":", ": ").replace("=", ":").replace("[","{").replace("]","}")

lines = data2.splitlines()
for i, line in enumerate(lines):
    if len(line)>0 and line[-1] in string.digits and not line.endswith(",") or i < len(lines) - 1 and line.endswith("}"):
        lines[i] += ","
data3 = "\n".join(lines)
yaml.load(data3) # {'a': 1, 'b': 2, 'c': {'x': 1, 'y': 2, 'z': 3}, 'd': 4}

Explanation

In the first line, we perform some simple substitutions:

  • YAML requires that there is a space after colons in key/value pairs. So with replace(":", ": "), we can ensure this.
  • Since YAML key/value pairs are always denoted by a colon and your format sometimes uses equals signs, we replace equal signs with commas using .replace("=", ":")
  • Your format sometimes uses square brackets where curly brackets should be used in YAML. We fix using .replace("[","{").replace("]","}")

At this point, your data looks like this:

{
    a : 1
    b : 2
    c : {
            x: 1,
            y: 2,
            z: 3,
        }
    d : 4
}

Next, we have a for loop. This is simply responsible for adding commas after lines where they're missing. The two cases in which for loops are missing are: - They're absent after a numeric value - They're absent after a closing bracket

We match the first of these cases using len(line)>0 and line[-1] in string.digits (the last character in the line is a digit)

The second case is matched using i < len(lines) - 1 and line.endswith("}"). This checks if the line ends with }, and also checks that the line is not the last, since YAML won't allow a comma after the last bracket.

After the loop, we have:

{
    a : 1,
    b : 2,
    c : {
            x: 1,
            y: 2,
            z: 3,
        },
    d : 4,
}

which is valid YAML. All that's left is yaml.load, and you've got yourself a python dict.

If anything isn't clear please leave a comment and I'll happily elaborate.

Upvotes: 1

Karin
Karin

Reputation: 8610

There is probably a better answer, but I would take advantage of all your dictionary keys being at the same indentation level. There's not an obvious way to be to do this with newline splitting, JSON loading, or that sort of thing since the list structure is a bit weird (it seems like a cross between a list and a dictionary).

Here's an implementation that parses keys based on indentation level:

import re

log = '''{
    a = 1
    b = 2
    c = [
            x:1,
            y:2,
            z:3,
        ]
    d = 4
}'''
log_lines = log.split('\n')[1:-1]  # strip bracket lines
KEY_REGEX = re.compile(r'    [^ ]')

d = {}
current_pair = ''
for i, line in enumerate(log_lines):
    if KEY_REGEX.match(line):
        if current_pair:
            key, value = current_pair.split('=')
            d[key.strip()] = value.strip()
        current_pair = line
    else:
        current_pair += line.strip()

if current_pair:
    key, value = current_pair.split('=')
    d[key.strip()] = value.strip()

print(d)

Output:

{'d': '4', 'c': '[x:1,y:2,z:3,]', 'a': '1', 'b': '2'}

Upvotes: 1

Related Questions