Reputation: 467
So I have a key value file that's similar to JSON's format but it's different enough to not be picked up by the Python JSON parser.
Example:
"Matt"
{
"Location" "New York"
"Age" "22"
"Items"
{
"Banana" "2"
"Apple" "5"
"Cat" "1"
}
}
Is there any easy way to parse this text file and store the values into an array such that I could access the data using a format similar to Matt[Items][Banana]? There is only to be one pair per line and a bracket should denote going down a level and going up a level.
Upvotes: 1
Views: 2462
Reputation: 47790
You could use re.sub
to 'fix up' your string and then parse it. As long as the format is always either a single quoted string or a pair of quoted strings on each line, you can use that to determine where to place commas and colons.
import re
s = """"Matt"
{
"Location" "New York"
"Age" "22"
"Items"
{
"Banana" "2"
"Apple" "5"
"Cat" "1"
}
}"""
# Put a colon after the first string in every line
s1 = re.sub(r'^\s*(".+?")', r'\1:', s, flags=re.MULTILINE)
# add a comma if the last non-whitespace character in a line is " or }
s2 = re.sub(r'(["}])\s*$', r'\1,', s1, flags=re.MULTILINE)
Once you've done that, you can use ast.literal_eval
to turn it into a Python dict. I use that over JSON parsing because it allows for trailing commas, without which the decision of where to put commas becomes a lot more complicated:
import ast
data = ast.literal_eval('{' + s2 + '}')
print data['Matt']['Items']['Banana']
# 2
Upvotes: 3
Reputation: 2549
Not sure how robust this approach is outside of the example you've posted but it does support for escaped characters and deeper levels of structured data. It's probably not going to be fast enough for large amounts of data.
The approach converts your custom data format to JSON using a (very) simple parser to add the required colons and braces, the JSON data can then be converted to a native Python dictionary.
import json
# Define the data that needs to be parsed
data = '''
"Matt"
{
"Location" "New \\"York"
"Age" "22"
"Items"
{
"Banana" "2"
"Apple" "5"
"Cat"
{
"foo" "bar"
}
}
}
'''
# Convert the data from custom format to JSON
json_data = ''
# Define parser states
state = 'OUT'
key_or_value = 'KEY'
for c in data:
# Handle quote characters
if c == '"':
json_data += c
if state == 'IN':
state = 'OUT'
if key_or_value == 'KEY':
key_or_value = 'VALUE'
json_data += ':'
elif key_or_value == 'VALUE':
key_or_value = 'KEY'
json_data += ','
else:
state = 'IN'
# Handle braces
elif c == '{':
if state == 'OUT':
key_or_value = 'KEY'
json_data += c
elif c == '}':
# Strip trailing comma and add closing brace and comma
json_data = json_data.rstrip().rstrip(',') + '},'
# Handle escaped characters
elif c == '\\':
state = 'ESCAPED'
json_data += c
else:
json_data += c
# Strip trailing comma
json_data = json_data.rstrip().rstrip(',')
# Wrap the data in braces to form a dictionary
json_data = '{' + json_data + '}'
# Convert from JSON to the native Python
converted_data = json.loads(json_data)
print(converted_data['Matt']['Items']['Banana'])
Upvotes: 0