Rahul
Rahul

Reputation: 452

how to read multiple dictionaries from a file in python?

I am relatively new to python. I am trying to read an ascii file with multiple dictionaries in it. The file has the following format.

{Key1: value1
 key2: value2
 ...
}
{Key1: value1
 key2: value2
 ...
}
{
...

Every dictionary in the file is a nested dictionary. I am trying to read it as a list of dictionaries. is there any simple way to do this? i have tried the following code but it doesn't seem to work

data = json.load(open('doc.txt'))

Upvotes: 8

Views: 8627

Answers (4)

Fomalhaut
Fomalhaut

Reputation: 9755

import re

fl = open('doc.txt', 'rb')

result = map(
    lambda part: dict(
        re.match(
            r'^\s*(.*?)\s*:\s*(.*?)\s*$', # splits with ':' ignoring space symbols
            line
        ).groups()
        for line in part.strip().split('\n') # splits with '\n', new line is a new key-value
    ),
    re.findall(
        r'\{(.*?)\}', # inside of { ... }
        fl.read(),
        flags=re.DOTALL # considering '\n'-symbols
    )
)

fl.close()

Upvotes: 0

martineau
martineau

Reputation: 123473

Since the data in your input file isn't really in JSON or Python object literal format, you're going to need to parse it yourself. You haven't really specified what the allowable keys and values are in the dictionary, so the following only allows them to be alphanumeric character strings.

So given an input file with the following contents nameddoc.txt:

{key1: value1
 key2: value2
 key3: value3
}
{key4: value4
 key5: value5
}

The following reads and transforms it into a Python list of dictionaries composed of alphanumeric keys and values:

from pprint import pprint
import re

dictpat = r'\{((?:\s*\w+\s*:\s*\w+\s*)+)\}' # note non-capturing (?:) inner group
itempat = r'(\s*(\w+)\s*:\s*(\w+)\s*)'      # which is captured in this expr

with open('doc.txt') as f:
    lod = [{group[1]:group[2] for group in re.findall(itempat, items)}
                                for items in re.findall(dictpat, f.read())]

pprint(lod)

Output:

[{'key1': 'value1', 'key2': 'value2', 'key3': 'value3'},
 {'key4': 'value4', 'key5': 'value5'}]

Upvotes: 2

UltraInstinct
UltraInstinct

Reputation: 44444

Provided the inner elements are valid JSON, the following could work. I dug up the source of simplejson library and modified it to suit your use case. An SSCCE is below.

import re
import simplejson

FLAGS = re.VERBOSE | re.MULTILINE | re.DOTALL
WHITESPACE = re.compile(r'[ \t\n\r]*', FLAGS)

def grabJSON(s):
    """Takes the largest bite of JSON from the string.
       Returns (object_parsed, remaining_string)
    """
    decoder = simplejson.JSONDecoder()
    obj, end = decoder.raw_decode(s)
    end = WHITESPACE.match(s, end).end()
    return obj, s[end:]

def main():
    with open("out.txt") as f:
        s = f.read()

    while True:
        obj, remaining = grabJSON(s)
        print ">", obj
        s = remaining
        if not remaining.strip():
            break

.. which with some similar JSON in out.txt will output something like:

> {'hello': ['world', 'hell', {'test': 'haha'}]}
> {'hello': ['world', 'hell', {'test': 'haha'}]}
> {'hello': ['world', 'hell', {'test': 'haha'}]}

Upvotes: 3

shinkou
shinkou

Reputation: 5154

You'll have to put it in a big list in order to get it work. i.e.

[
    {key1: val1, key2: val2, key3: val3, ...keyN: valN}
    , {key1: val1, key2: val2, key3: val3, ...keyN: valN}
    , {key1: val1, key2: val2, key3: val3, ...keyN: valN}
    .
    .
    .
]

If you can't change the data file format, I'm afraid you'll have to roll your own function to interpret the data.

Upvotes: 1

Related Questions