Reputation: 4376
I have a system log that looks like the following:
{
a = 1
b = 2
c = [
x:1,
y:2,
z:3,
]
d = 4
}
I want to parse this in Python into a dictionary object with = splitting key-value pairs. At the same time, the array that is enclosed by [] is also preserved. I want to keep this as generic as possible so the parsing can also hold some future variations.
What I tried so far (code will be written): split each line by "=" into key-value pair, determine where [ and ] starts and end and then split the lines in between by ":" into key-value pairs. That seems a little hard-coded.. Any better idea?
Upvotes: 0
Views: 644
Reputation: 9580
This could be pretty easily simplified to YAML. pip install pyyaml
, then set up like so:
import string, yaml
data = """
{
a = 1
b = 2
c = [
x:1,
y:2,
z:3,
]
d = 4
}
"""
With this setup, you can use the following to parse your data:
data2 = data.replace(":", ": ").replace("=", ":").replace("[","{").replace("]","}")
lines = data2.splitlines()
for i, line in enumerate(lines):
if len(line)>0 and line[-1] in string.digits and not line.endswith(",") or i < len(lines) - 1 and line.endswith("}"):
lines[i] += ","
data3 = "\n".join(lines)
yaml.load(data3) # {'a': 1, 'b': 2, 'c': {'x': 1, 'y': 2, 'z': 3}, 'd': 4}
In the first line, we perform some simple substitutions:
replace(":", ": ")
, we can ensure this..replace("=", ":")
.replace("[","{").replace("]","}")
At this point, your data looks like this:
{
a : 1
b : 2
c : {
x: 1,
y: 2,
z: 3,
}
d : 4
}
Next, we have a for
loop. This is simply responsible for adding commas after lines where they're missing. The two cases in which for loops are missing are:
- They're absent after a numeric value
- They're absent after a closing bracket
We match the first of these cases using len(line)>0 and line[-1] in string.digits
(the last character in the line is a digit)
The second case is matched using i < len(lines) - 1 and line.endswith("}")
. This checks if the line ends with }
, and also checks that the line is not the last, since YAML won't allow a comma after the last bracket.
After the loop, we have:
{
a : 1,
b : 2,
c : {
x: 1,
y: 2,
z: 3,
},
d : 4,
}
which is valid YAML. All that's left is yaml.load
, and you've got yourself a python dict
.
If anything isn't clear please leave a comment and I'll happily elaborate.
Upvotes: 1
Reputation: 8610
There is probably a better answer, but I would take advantage of all your dictionary keys being at the same indentation level. There's not an obvious way to be to do this with newline splitting, JSON loading, or that sort of thing since the list structure is a bit weird (it seems like a cross between a list and a dictionary).
Here's an implementation that parses keys based on indentation level:
import re
log = '''{
a = 1
b = 2
c = [
x:1,
y:2,
z:3,
]
d = 4
}'''
log_lines = log.split('\n')[1:-1] # strip bracket lines
KEY_REGEX = re.compile(r' [^ ]')
d = {}
current_pair = ''
for i, line in enumerate(log_lines):
if KEY_REGEX.match(line):
if current_pair:
key, value = current_pair.split('=')
d[key.strip()] = value.strip()
current_pair = line
else:
current_pair += line.strip()
if current_pair:
key, value = current_pair.split('=')
d[key.strip()] = value.strip()
print(d)
Output:
{'d': '4', 'c': '[x:1,y:2,z:3,]', 'a': '1', 'b': '2'}
Upvotes: 1