Reputation: 361
I'm trying to get some information out of a very complex JSON file with Python. Below is just one object from the file:
{
"__metadata": {
"uri": "/Students/news/_vti_bin/ListData.svc/Posts(4)", "etag": "W/\"2\"", "type": "Microsoft.SharePoint.DataService.PostsItem"
}, "Title": "Term 2 Round 2 draws", "Body": "<div class=\"ExternalClass0BC1BCA4D3EE45A4A1F34086034FE827\"><p>\u200bAs there is no Gonzagan this week the following Senior Sport information has been provided here.\r\n\t </p>\r\n<ul><li><a target=\"_blank\" href=\"/Intranet/students/news_resources/2011/Term2/Knox _wet_weather.pdf\">Knox _wet_weather</a> Cancellations, please see <a target=\"_blank\" href=\"http://www.twitter.com/SACWetWeather\">twitter page</a> for further news.</li>\r\n<li><a target=\"_blank\" href=\"/Intranet/students/news_resources/2011/Term2/2011_Football_round_2.pdf\">2011 Football draw Round 2</a></li>\r\n<li><a target=\"_blank\" href=\"/Intranet/students/news_resources/2011/Term2/2011_Rugby_round_2.pdf\">2011 Rugby draw Round 2</a></li></ul>\r\n<p></p></div>", "Category": {
"__deferred": {
"uri": "/Students/news/_vti_bin/ListData.svc/Posts(4)/Category"
}
}, "Published": "\/Date(1308342960000)\/", "ContentTypeID": "0x0110001F9F7104FDD3054AAB40D8561196E09E", "ApproverComments": null, "Comments": {
"__deferred": {
"uri": "/_vti_bin/ListData.svc/Posts(4)/Comments"
}
}, "CommentsId": 0, "ApprovalStatus": "0", "Id": 4, "ContentType": "Post", "Modified": "\/Date(1309122092000)\/", "Created": "\/Date(1309120597000)\/", "CreatedBy": {
"__deferred": {
"uri": "/Students/news/_vti_bin/ListData.svc/Posts(4)/CreatedBy"
}
}, "CreatedById": 1, "ModifiedBy": {
"__deferred": {
"uri": "/Students/news/_vti_bin/ListData.svc/Posts(4)/ModifiedBy"
}
}, "ModifiedById": 1, "Owshiddenversion": 2, "Version": "1.0", "Path": "/Students/news/Lists/Posts"
},
I can't wrap my head around editing this. Converting it to a python dictionary seems to jumble the order of the attributes up, making it impossible for me to find where one object starts and another begins. What's the best way for me to extract just the 'title', 'body' and 'published' keys and values, and how would I do it for multiple objects?
Upvotes: 1
Views: 402
Reputation: 15709
import json
obj = json.loads(json_input)
for record in obj:
print obj["title"]
print obj["body"]
print obj["published"]
Presuming that json_input is the above snippet, in string form, or already read in via a file. Also note, I presumed the above snippet was to be a collection based on your question.
Update
Based on the example, you have another layer that was not present in the snippet originally posted.
Change the loop to be:
for record in obj["d"]["results"]:
...
Upvotes: 1
Reputation: 59313
I'm assuming your main JSON object is an array of those objects. Here's how I'd print out the information you're after:
import json
main_array = json.load('my_json_file.json')
for sub_object in main_array:
print "Title: {}\nBody: {}\nPublished: {}\n".format(
sub_object['Title'], sub_object['Body'], sub_object['Published']
)
Upvotes: 1