Reputation: 452
I am relatively new to python. I am trying to read an ascii file with multiple dictionaries in it. The file has the following format.
{Key1: value1
key2: value2
...
}
{Key1: value1
key2: value2
...
}
{
...
Every dictionary in the file is a nested dictionary. I am trying to read it as a list of dictionaries. is there any simple way to do this? i have tried the following code but it doesn't seem to work
data = json.load(open('doc.txt'))
Upvotes: 8
Views: 8627
Reputation: 9755
import re
fl = open('doc.txt', 'rb')
result = map(
lambda part: dict(
re.match(
r'^\s*(.*?)\s*:\s*(.*?)\s*$', # splits with ':' ignoring space symbols
line
).groups()
for line in part.strip().split('\n') # splits with '\n', new line is a new key-value
),
re.findall(
r'\{(.*?)\}', # inside of { ... }
fl.read(),
flags=re.DOTALL # considering '\n'-symbols
)
)
fl.close()
Upvotes: 0
Reputation: 123473
Since the data in your input file isn't really in JSON or Python object literal format, you're going to need to parse it yourself. You haven't really specified what the allowable keys and values are in the dictionary, so the following only allows them to be alphanumeric character strings.
So given an input file with the following contents nameddoc.txt
:
{key1: value1
key2: value2
key3: value3
}
{key4: value4
key5: value5
}
The following reads and transforms it into a Python list of dictionaries composed of alphanumeric keys and values:
from pprint import pprint
import re
dictpat = r'\{((?:\s*\w+\s*:\s*\w+\s*)+)\}' # note non-capturing (?:) inner group
itempat = r'(\s*(\w+)\s*:\s*(\w+)\s*)' # which is captured in this expr
with open('doc.txt') as f:
lod = [{group[1]:group[2] for group in re.findall(itempat, items)}
for items in re.findall(dictpat, f.read())]
pprint(lod)
Output:
[{'key1': 'value1', 'key2': 'value2', 'key3': 'value3'},
{'key4': 'value4', 'key5': 'value5'}]
Upvotes: 2
Reputation: 44444
Provided the inner elements are valid JSON, the following could work. I dug up the source of simplejson
library and modified it to suit your use case. An SSCCE is below.
import re
import simplejson
FLAGS = re.VERBOSE | re.MULTILINE | re.DOTALL
WHITESPACE = re.compile(r'[ \t\n\r]*', FLAGS)
def grabJSON(s):
"""Takes the largest bite of JSON from the string.
Returns (object_parsed, remaining_string)
"""
decoder = simplejson.JSONDecoder()
obj, end = decoder.raw_decode(s)
end = WHITESPACE.match(s, end).end()
return obj, s[end:]
def main():
with open("out.txt") as f:
s = f.read()
while True:
obj, remaining = grabJSON(s)
print ">", obj
s = remaining
if not remaining.strip():
break
.. which with some similar JSON in out.txt will output something like:
> {'hello': ['world', 'hell', {'test': 'haha'}]}
> {'hello': ['world', 'hell', {'test': 'haha'}]}
> {'hello': ['world', 'hell', {'test': 'haha'}]}
Upvotes: 3
Reputation: 5154
You'll have to put it in a big list in order to get it work. i.e.
[
{key1: val1, key2: val2, key3: val3, ...keyN: valN}
, {key1: val1, key2: val2, key3: val3, ...keyN: valN}
, {key1: val1, key2: val2, key3: val3, ...keyN: valN}
.
.
.
]
If you can't change the data file format, I'm afraid you'll have to roll your own function to interpret the data.
Upvotes: 1