Reputation: 1447
The json file content as follows:
{"votes": {"funny": 0, "useful": 5, "cool": 2}, "user_id": "rLtl8ZkDX5vH5nAx9C3q5Q", "review_id": "fWKvX83p0-ka4JS3dc6E5A", "stars": 5, "date": "2011-01-26", "text": "My wife took me here on my birthday for breakfast and it was excellent. It looked like the place fills up pretty quickly so the earlier you get here the better.\n\nDo yourself a favor and get their Bloody Mary. It came with 2 pieces of their griddled bread with was amazing and it absolutely made the meal complete. It was the best \"toast\" I've ever had.\n\nAnyway, I can't wait to go back!", "type": "review", "business_id": "9yKzy9PApeiPPOUJEtnvkg"}
{"votes": {"funny": 0, "useful": 0, "cool": 0}, "user_id": "0a2KyEL0d3Yb1V6aivbIuQ", "review_id": "IjZ33sJrzXqU-0X6U8NwyA", "stars": 5, "date": "2011-07-27", "text": "I have no idea why some people give bad reviews about this place. It goes to show you, you can please everyone. They are probably griping about something that their own fault... but they said we'll be seated when the girl comes back from seating someone else. So, everything was great and not like these bad reviewers. That goes to show you that you have to try these things yourself because all these bad reviewers have some serious issues.", "type": "review", "business_id": "ZRJwVLyzEJq1VAihDhYiow"}
my code is:
import json
from pprint import pprint
review = open('/User/Desktop/python/test.json')
data = json.load(review)
pprint(data["votes"])
The error is:
Traceback (most recent call last):
File "/Users/hadoop/Documents/workspace/dataming-course/src/Yelp/main.py", line 8, in <module>
data = json.load(review)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 278, in load
**kw)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 326, in loads
return _default_decoder.decode(s)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 363, in decode
raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 2 column 1 - line 3 column 1 (char 623 - 1294)
Upvotes: 1
Views: 454
Reputation: 7583
For what it's worth, you could try putting your JSON into an array, like this:
[ { "business_id" : "9yKzy9PApeiPPOUJEtnvkg",
"date" : "2011-01-26",
"review_id" : "fWKvX83p0-ka4JS3dc6E5A",
"stars" : "5",
"text" : "My wife took me here on my birthday for breakfast and it was excellent. It looked like the place fills up pretty quickly so the earlier you get here the better.\n\nDo yourself a favor and get their Bloody Mary. It came with 2 pieces of their griddled bread with was amazing and it absolutely made the meal complete. It was the best \"toast\" I've ever had.\n\nAnyway, I can't wait to go back!",
"type" : "review",
"user_id" : "rLtl8ZkDX5vH5nAx9C3q5Q",
"votes" : { "cool" : "2",
"funny" : "0",
"useful" : "5"
}
},
{ "business_id" : "ZRJwVLyzEJq1VAihDhYiow",
"date" : "2011-07-27",
"review_id" : "IjZ33sJrzXqU-0X6U8NwyA",
"stars" : "5",
"text" : "I have no idea why some people give bad reviews about this place. It goes to show you, you can please everyone. They are probably griping about something that their own fault... but they said we'll be seated when the girl comes back from seating someone else. So, everything was great and not like these bad reviewers. That goes to show you that you have to try these things yourself because all these bad reviewers have some serious issues.",
"type" : "review",
"user_id" : "0a2KyEL0d3Yb1V6aivbIuQ",
"votes" : { "cool" : "0",
"funny" : "0",
"useful" : "0"
}
}
]
(And do note the ,
that separates the two "main" parts of the JSON array :)
Upvotes: 1
Reputation: 33397
If you can't change the input file, you may use JSONDecoder.raw_decode
to do it in chunks.
>>> dec = json.JSONDecoder()
>>> dec.raw_decode('["a",1]{"foo": 2}')
(['a', 1], 7)
>>> dec.raw_decode('["a",1]{"foo": 2}', 7)
({'foo': 2}, 17)
You will need to read the file to a string first.
Upvotes: 2
Reputation: 249123
You have two JSON documents in a single file. Consider putting them into an array or something. The top-level of the file should only contain a single element.
Upvotes: 5