How to add commas in between JSON objects present in a .txt file and then convert it into JSON array in Python

Question

I am reading a txt file which has JSON objects where the objects are not separated by commas. I would like to add commas between the json objects and place them all into a JSON list or Array.

I have tried JSON.loads but I am getting the JSON Decode error. So I realized i am supposed to put commas in between the different objects present in the .txt file

Below is the example of the file content in .txt

{
    "@mdate": "2011-01-11",
    "@key": "journals/acta/Saxena96",
    "author": {
        "ftail": "
",
        "ftext": "Sanjeev Saxena"
    },
    "title": {
        "ftail": "
",
        "ftext": "Parallel Integer Sorting and Simulation Amongst CRCW Models."
    },
    "pages": {
        "ftail": "
",
        "ftext": "607-619"
    },
    "year": {
        "ftail": "
",
        "ftext": "1996"
    },
    "volume": {
        "ftail": "
",
        "ftext": "33"
    },
    "journal": {
        "ftail": "
",
        "ftext": "Acta Inf."
    },
    "number": {
        "ftail": "
",
        "ftext": "7"
    },
    "url": {
        "ftail": "
",
        "ftext": "db/journals/acta/acta33.htmlfSaxena96"
    },
    "ee": {
        "ftail": "
",
        "ftext": "http://dx.doi.org/10.1007/BF03036466"
    },
    "ftail": "
",
    "ftext": "
"
}{
    "@mdate": "2011-01-11",
    "@key": "journals/acta/Simon83",
    "author": {
        "ftail": "
",
        "ftext": "Hans-Ulrich Simon"
    },
    "title": {
        "ftail": "
",
        "ftext": "Pattern Matching in Trees and Nets."
    },
    "pages": {
        "ftail": "
",
        "ftext": "227-248"
    },
    "year": {
        "ftail": "
",
        "ftext": "1983"
    },
    "volume": {
        "ftail": "
",
        "ftext": "20"
    },
    "journal": {
        "ftail": "
",
        "ftext": "Acta Inf."
    },
    "url": {
        "ftail": "
",
        "ftext": "db/journals/acta/acta20.htmlfSimon83"
    },
    "ee": {
        "ftail": "
",
        "ftext": "http://dx.doi.org/10.1007/BF01257084"
    },
    "ftail": "
",
    "ftext": "
"
}

''''''''''''''''''''''''''''''''''''

Expected Result:

''''''''''''''''''''''''''''''''''''

[
{
    "@mdate": "2011-01-11",
    "@key": "journals/acta/Saxena96",
    "author": {
        "ftail": "
",
        "ftext": "Sanjeev Saxena"
    },
    "title": {
        "ftail": "
",
        "ftext": "Parallel Integer Sorting and Simulation Amongst CRCW Models."
    },
    "pages": {
        "ftail": "
",
        "ftext": "607-619"
    },
    "year": {
        "ftail": "
",
        "ftext": "1996"
    },
    "volume": {
        "ftail": "
",
        "ftext": "33"
    },
    "journal": {
        "ftail": "
",
        "ftext": "Acta Inf."
    },
    "number": {
        "ftail": "
",
        "ftext": "7"
    },
    "url": {
        "ftail": "
",
        "ftext": "db/journals/acta/acta33.htmlfSaxena96"
    },
    "ee": {
        "ftail": "
",
        "ftext": "http://dx.doi.org/10.1007/BF03036466"
    },
    "ftail": "
",
    "ftext": "
"
},
{
    "@mdate": "2011-01-11",
    "@key": "journals/acta/Simon83",
    "author": {
        "ftail": "
",
        "ftext": "Hans-Ulrich Simon"
    },
    "title": {
        "ftail": "
",
        "ftext": "Pattern Matching in Trees and Nets."
    },
    "pages": {
        "ftail": "
",
        "ftext": "227-248"
    },
    "year": {
        "ftail": "
",
        "ftext": "1983"
    },
    "volume": {
        "ftail": "
",
        "ftext": "20"
    },
    "journal": {
        "ftail": "
",
        "ftext": "Acta Inf."
    },
    "url": {
        "ftail": "
",
        "ftext": "db/journals/acta/acta20.htmlfSimon83"
    },
    "ee": {
        "ftail": "
",
        "ftext": "http://dx.doi.org/10.1007/BF01257084"
    },
    "ftail": "
",
    "ftext": "
"
}
]

''''''''''''''''''''

zwer · Accepted Answer

If you can always guarantee that your JSON will be formatted as in your example, i.e. new JSON object begins on the same line where the last one ends and there is no indent, you can get by just by reading your JSON into a buffer until you encounter such line and then sending the buffer for JSON parsing - rinse & repeat:

import json

parsed = []  # a list to hold individually parsed JSON objects
with open('path/to/your.json') as f:
    buffer = ''
    for line in f:
        if line[0] == '}':  # end of the current JSON object
            parsed.append(json.loads(buffer + '}'))
            buffer = line[1:]
        else:
            buffer += line

print(json.dumps(parsed, indent=2))  # just to make sure it all went well

Which would yield:

[
  {
    "@mdate": "2011-01-11",
    "@key": "journals/acta/Saxena96",
    "author": {
      "ftail": "
",
      "ftext": "Sanjeev Saxena"
    },
    "title": {
      "ftail": "
",
      "ftext": "Parallel Integer Sorting and Simulation Amongst CRCW Models."
    },
    "pages": {
      "ftail": "
",
      "ftext": "607-619"
    },
    "year": {
      "ftail": "
",
      "ftext": "1996"
    },
    "volume": {
      "ftail": "
",
      "ftext": "33"
    },
    "journal": {
      "ftail": "
",
      "ftext": "Acta Inf."
    },
    "number": {
      "ftail": "
",
      "ftext": "7"
    },
    "url": {
      "ftail": "
",
      "ftext": "db/journals/acta/acta33.htmlfSaxena96"
    },
    "ee": {
      "ftail": "
",
      "ftext": "http://dx.doi.org/10.1007/BF03036466"
    },
    "ftail": "
",
    "ftext": "
"
  },
  {
    "@mdate": "2011-01-11",
    "@key": "journals/acta/Simon83",
    "author": {
      "ftail": "
",
      "ftext": "Hans-Ulrich Simon"
    },
    "title": {
      "ftail": "
",
      "ftext": "Pattern Matching in Trees and Nets."
    },
    "pages": {
      "ftail": "
",
      "ftext": "227-248"
    },
    "year": {
      "ftail": "
",
      "ftext": "1983"
    },
    "volume": {
      "ftail": "
",
      "ftext": "20"
    },
    "journal": {
      "ftail": "
",
      "ftext": "Acta Inf."
    },
    "url": {
      "ftail": "
",
      "ftext": "db/journals/acta/acta20.htmlfSimon83"
    },
    "ee": {
      "ftail": "
",
      "ftext": "http://dx.doi.org/10.1007/BF01257084"
    },
    "ftail": "
",
    "ftext": "
"
  }
]

If your case is not as clear cut (i.e. you can't predict the formatting) you can try out some of the iterative/event-based JSON parsers (ijson for example) which would be able to tell you once a 'root' object is closed so that you can 'split' the parsed JSON objects into a sequence.

UPDATE: On a second thought, you don't need anything apart from the built-in json module even if your concatenated JSONs are not properly or indented at all - you can use json.JSONDecoder.raw_decode() (and its undocumented second parameter) to traverse your data and look for valid JSON structures in an iterative manner until you've traversed your whole file (or encountered an error). For example:

import json

parser = json.JSONDecoder()
parsed = []  # a list to hold individually parsed JSON structures
with open('test.json') as f:
    data = f.read()
head = 0  # hold the current position as we parse
while True:
    head = (data.find('{', head) + 1 or data.find('[', head) + 1) - 1
    try:
        struct, head = parser.raw_decode(data, head)
        parsed.append(struct)
    except (ValueError, json.JSONDecodeError):  # no more valid JSON structures
        break

print(json.dumps(parsed, indent=2))  # make sure it all went well

Should give you the same result as above but this time won't depend on } being the first character of a new line whenever your JSON object 'closes'. It should also work for JSON arrays stacked back-to-back.

How to add commas in between JSON objects present in a .txt file and then convert it into JSON array in Python

Answers (2)

Related Questions