Reginsmal
Reginsmal

Reputation: 127

Error in reading from json file with multiple dicts

I have a json file that contains at least 30 000 dicts. It can be found here:

http://openxcplatform.com.s3.amazonaws.com/traces/nyc/downtown-west.json

I have scourged through the internet and found that this brought me closest to what I need, as I need to read through the json file one by one, input the dicts as an actual dict into a list:

with open("test.json") as data_file:
    for x in data_file:
        json.dumps(it.append(ast.literal_eval(x)))

I tested this code and it worked for the most part. I tested with the first 2000 elements but, once I test the entire file, I receive this error:

  File "converter.py", line 58, in <module>
    if __name__ == "__main__": main()
  File "converter.py", line 34, in main
    json.dumps(it.append(ast.literal_eval(x)))
  File "/usr/lib/python2.7/ast.py", line 80, in literal_eval
    return _convert(node_or_string)
  File "/usr/lib/python2.7/ast.py", line 63, in _convert
    in zip(node.keys, node.values))
  File "/usr/lib/python2.7/ast.py", line 62, in <genexpr>
    return dict((_convert(k), _convert(v)) for k, v
  File "/usr/lib/python2.7/ast.py", line 79, in _convert
    raise ValueError('malformed string')
ValueError: malformed string

Anyone know why this may be happening?

Upvotes: 0

Views: 355

Answers (3)

Reginsmal
Reginsmal

Reputation: 127

I found that using TypeError: expected string or buffer in Google App Engine's Python helped in getting the program to behave properly. Using only json.loads gave me a typeerror.

Upvotes: 0

hruske
hruske

Reputation: 2253

First, the file is not JSON formatted, but JSON-lines.

Second, you don't want to read JSON data with ast.literal_eval, since it 1) is very insecure, 2) is not a JSON parser and throws an error, when it sees false or true.

Use json.loads.

Upvotes: 2

Ayush
Ayush

Reputation: 42440

You dont' want to use json.dumps as that converts a dict to JSON. You are doing the reverse - reading JSON and converting to dict. You need to use json.loads() for that:

it = []
failures = []

with open('you_file.json') as f:
  for line in f:
    try:
      it.append(json.loads(line))
    except Exception:
      failures.append(line)

print 'Parsed {0} lines'.format(len(it))
print 'Failed {0} lines'.format(len(failures))

Upvotes: 1

Related Questions