Reputation: 53
I am reading a txt file which has JSON objects where the objects are not separated by commas. I would like to add commas between the json objects and place them all into a JSON list or Array.
I have tried JSON.loads but I am getting the JSON Decode error. So I realized i am supposed to put commas in between the different objects present in the .txt file
Below is the example of the file content in .txt
{
"@mdate": "2011-01-11",
"@key": "journals/acta/Saxena96",
"author": {
"ftail": "\n",
"ftext": "Sanjeev Saxena"
},
"title": {
"ftail": "\n",
"ftext": "Parallel Integer Sorting and Simulation Amongst CRCW Models."
},
"pages": {
"ftail": "\n",
"ftext": "607-619"
},
"year": {
"ftail": "\n",
"ftext": "1996"
},
"volume": {
"ftail": "\n",
"ftext": "33"
},
"journal": {
"ftail": "\n",
"ftext": "Acta Inf."
},
"number": {
"ftail": "\n",
"ftext": "7"
},
"url": {
"ftail": "\n",
"ftext": "db/journals/acta/acta33.htmlfSaxena96"
},
"ee": {
"ftail": "\n",
"ftext": "http://dx.doi.org/10.1007/BF03036466"
},
"ftail": "\n",
"ftext": "\n"
}{
"@mdate": "2011-01-11",
"@key": "journals/acta/Simon83",
"author": {
"ftail": "\n",
"ftext": "Hans-Ulrich Simon"
},
"title": {
"ftail": "\n",
"ftext": "Pattern Matching in Trees and Nets."
},
"pages": {
"ftail": "\n",
"ftext": "227-248"
},
"year": {
"ftail": "\n",
"ftext": "1983"
},
"volume": {
"ftail": "\n",
"ftext": "20"
},
"journal": {
"ftail": "\n",
"ftext": "Acta Inf."
},
"url": {
"ftail": "\n",
"ftext": "db/journals/acta/acta20.htmlfSimon83"
},
"ee": {
"ftail": "\n",
"ftext": "http://dx.doi.org/10.1007/BF01257084"
},
"ftail": "\n",
"ftext": "\n"
}
''''''''''''''''''''''''''''''''''''
Expected Result:
''''''''''''''''''''''''''''''''''''
[
{
"@mdate": "2011-01-11",
"@key": "journals/acta/Saxena96",
"author": {
"ftail": "\n",
"ftext": "Sanjeev Saxena"
},
"title": {
"ftail": "\n",
"ftext": "Parallel Integer Sorting and Simulation Amongst CRCW Models."
},
"pages": {
"ftail": "\n",
"ftext": "607-619"
},
"year": {
"ftail": "\n",
"ftext": "1996"
},
"volume": {
"ftail": "\n",
"ftext": "33"
},
"journal": {
"ftail": "\n",
"ftext": "Acta Inf."
},
"number": {
"ftail": "\n",
"ftext": "7"
},
"url": {
"ftail": "\n",
"ftext": "db/journals/acta/acta33.htmlfSaxena96"
},
"ee": {
"ftail": "\n",
"ftext": "http://dx.doi.org/10.1007/BF03036466"
},
"ftail": "\n",
"ftext": "\n"
},
{
"@mdate": "2011-01-11",
"@key": "journals/acta/Simon83",
"author": {
"ftail": "\n",
"ftext": "Hans-Ulrich Simon"
},
"title": {
"ftail": "\n",
"ftext": "Pattern Matching in Trees and Nets."
},
"pages": {
"ftail": "\n",
"ftext": "227-248"
},
"year": {
"ftail": "\n",
"ftext": "1983"
},
"volume": {
"ftail": "\n",
"ftext": "20"
},
"journal": {
"ftail": "\n",
"ftext": "Acta Inf."
},
"url": {
"ftail": "\n",
"ftext": "db/journals/acta/acta20.htmlfSimon83"
},
"ee": {
"ftail": "\n",
"ftext": "http://dx.doi.org/10.1007/BF01257084"
},
"ftail": "\n",
"ftext": "\n"
}
]
''''''''''''''''''''
Upvotes: 3
Views: 4378
Reputation: 25799
If you can always guarantee that your JSON will be formatted as in your example, i.e. new JSON object begins on the same line where the last one ends and there is no indent, you can get by just by reading your JSON into a buffer until you encounter such line and then sending the buffer for JSON parsing - rinse & repeat:
import json
parsed = [] # a list to hold individually parsed JSON objects
with open('path/to/your.json') as f:
buffer = ''
for line in f:
if line[0] == '}': # end of the current JSON object
parsed.append(json.loads(buffer + '}'))
buffer = line[1:]
else:
buffer += line
print(json.dumps(parsed, indent=2)) # just to make sure it all went well
Which would yield:
[
{
"@mdate": "2011-01-11",
"@key": "journals/acta/Saxena96",
"author": {
"ftail": "\n",
"ftext": "Sanjeev Saxena"
},
"title": {
"ftail": "\n",
"ftext": "Parallel Integer Sorting and Simulation Amongst CRCW Models."
},
"pages": {
"ftail": "\n",
"ftext": "607-619"
},
"year": {
"ftail": "\n",
"ftext": "1996"
},
"volume": {
"ftail": "\n",
"ftext": "33"
},
"journal": {
"ftail": "\n",
"ftext": "Acta Inf."
},
"number": {
"ftail": "\n",
"ftext": "7"
},
"url": {
"ftail": "\n",
"ftext": "db/journals/acta/acta33.htmlfSaxena96"
},
"ee": {
"ftail": "\n",
"ftext": "http://dx.doi.org/10.1007/BF03036466"
},
"ftail": "\n",
"ftext": "\n"
},
{
"@mdate": "2011-01-11",
"@key": "journals/acta/Simon83",
"author": {
"ftail": "\n",
"ftext": "Hans-Ulrich Simon"
},
"title": {
"ftail": "\n",
"ftext": "Pattern Matching in Trees and Nets."
},
"pages": {
"ftail": "\n",
"ftext": "227-248"
},
"year": {
"ftail": "\n",
"ftext": "1983"
},
"volume": {
"ftail": "\n",
"ftext": "20"
},
"journal": {
"ftail": "\n",
"ftext": "Acta Inf."
},
"url": {
"ftail": "\n",
"ftext": "db/journals/acta/acta20.htmlfSimon83"
},
"ee": {
"ftail": "\n",
"ftext": "http://dx.doi.org/10.1007/BF01257084"
},
"ftail": "\n",
"ftext": "\n"
}
]
If your case is not as clear cut (i.e. you can't predict the formatting) you can try out some of the iterative/event-based JSON parsers (ijson
for example) which would be able to tell you once a 'root' object is closed so that you can 'split' the parsed JSON objects into a sequence.
UPDATE: On a second thought, you don't need anything apart from the built-in json
module even if your concatenated JSONs are not properly or indented at all - you can use json.JSONDecoder.raw_decode()
(and its undocumented second parameter) to traverse your data and look for valid JSON structures in an iterative manner until you've traversed your whole file (or encountered an error). For example:
import json
parser = json.JSONDecoder()
parsed = [] # a list to hold individually parsed JSON structures
with open('test.json') as f:
data = f.read()
head = 0 # hold the current position as we parse
while True:
head = (data.find('{', head) + 1 or data.find('[', head) + 1) - 1
try:
struct, head = parser.raw_decode(data, head)
parsed.append(struct)
except (ValueError, json.JSONDecodeError): # no more valid JSON structures
break
print(json.dumps(parsed, indent=2)) # make sure it all went well
Should give you the same result as above but this time won't depend on }
being the first character of a new line whenever your JSON object 'closes'. It should also work for JSON arrays stacked back-to-back.
Upvotes: 0
Reputation: 58
you can add comma between objects with reqexp:
import re
with open('name.txt', 'r') as input, open('out.txt', 'w') as output:
output.write("[\n")
for line in input:
line = re.sub('}{', '},{', line)
output.write(' '+line)
output.write("]\n")
Upvotes: 2