Reputation: 65
I'm trying to parse the json format data to json.load() method. But it's giving me an error. I tried different methods like reading line by line, convert into dictionary, list, and so on but it isn't working. I also tried the solution mention in the following url loading-and-parsing-a-json but it give's me the same error.
import json
data = []
with open('output.txt','r') as f:
for line in f:
data.append(json.loads(line))
Error:
ValueError: Extra data: line 1 column 71221 - line 1 column 6783824 (char 71220 - 6783823)
Please find the output.txt in the below URL
Upvotes: 1
Views: 2240
Reputation: 864
I wrote up the following which will break up your file into one JSON string per line and then go back through it and do what you originally intended. There's certainly room for optimization here, but at least it works as you expected now.
import json
import re
PATTERN = '{"statuses"'
file_as_str = ''
with open('output.txt', 'r+') as f:
file_as_str = f.read()
m = re.finditer(PATTERN, file_as_str)
f.seek(0)
for pos in m:
if pos.start() == 0:
pass
else:
f.seek(pos.start())
f.write('\n{"')
data = []
with open('output.txt','r') as f:
for line in f:
data.append(json.loads(line))
Upvotes: 1
Reputation: 168616
Your alleged JSON file is not a properly formatted JSON file. JSON files must contain exactly one object (a list, a mapping, a number, a string, etc). Your file appears to contain a number of JSON objects in sequence, but not in the correct format for a list.
Your program's JSON parser correctly returns an error condition when presented with this non-JSON data.
Here is a program that will interpret your file:
import json
# Idea and some code stolen from https://gist.github.com/sampsyo/920215
data = []
with open('output.txt') as f:
s = f.read()
decoder = json.JSONDecoder()
while s.strip():
datum, index = decoder.raw_decode(s)
data.append(datum)
s = s[index:]
print len(data)
Upvotes: 1