Reputation: 127
I have a json file that contains at least 30 000 dicts. It can be found here:
http://openxcplatform.com.s3.amazonaws.com/traces/nyc/downtown-west.json
I have scourged through the internet and found that this brought me closest to what I need, as I need to read through the json file one by one, input the dicts as an actual dict into a list:
with open("test.json") as data_file:
for x in data_file:
json.dumps(it.append(ast.literal_eval(x)))
I tested this code and it worked for the most part. I tested with the first 2000 elements but, once I test the entire file, I receive this error:
File "converter.py", line 58, in <module>
if __name__ == "__main__": main()
File "converter.py", line 34, in main
json.dumps(it.append(ast.literal_eval(x)))
File "/usr/lib/python2.7/ast.py", line 80, in literal_eval
return _convert(node_or_string)
File "/usr/lib/python2.7/ast.py", line 63, in _convert
in zip(node.keys, node.values))
File "/usr/lib/python2.7/ast.py", line 62, in <genexpr>
return dict((_convert(k), _convert(v)) for k, v
File "/usr/lib/python2.7/ast.py", line 79, in _convert
raise ValueError('malformed string')
ValueError: malformed string
Anyone know why this may be happening?
Upvotes: 0
Views: 355
Reputation: 127
I found that using TypeError: expected string or buffer in Google App Engine's Python helped in getting the program to behave properly. Using only json.loads gave me a typeerror.
Upvotes: 0
Reputation: 2253
First, the file is not JSON formatted, but JSON-lines.
Second, you don't want to read JSON data with ast.literal_eval
, since it 1) is very insecure, 2) is not a JSON parser and throws an error, when it sees false
or true
.
Use json.loads.
Upvotes: 2
Reputation: 42440
You dont' want to use json.dumps
as that converts a dict to JSON. You are doing the reverse - reading JSON and converting to dict. You need to use json.loads()
for that:
it = []
failures = []
with open('you_file.json') as f:
for line in f:
try:
it.append(json.loads(line))
except Exception:
failures.append(line)
print 'Parsed {0} lines'.format(len(it))
print 'Failed {0} lines'.format(len(failures))
Upvotes: 1