Reputation: 361

Extracting from a very complex JSON file in Python

I'm trying to get some information out of a very complex JSON file with Python. Below is just one object from the file:

{
"__metadata": {
"uri": "/Students/news/_vti_bin/ListData.svc/Posts(4)", "etag": "W/\"2\"", "type": "Microsoft.SharePoint.DataService.PostsItem"
}, "Title": "Term 2 Round 2 draws", "Body": "<div class=\"ExternalClass0BC1BCA4D3EE45A4A1F34086034FE827\"><p>\u200bAs there is no Gonzagan this week the following Senior Sport information has been provided here.\r\n\t    </p>\r\n<ul><li><a target=\"_blank\" href=\"/Intranet/students/news_resources/2011/Term2/Knox _wet_weather.pdf\">Knox _wet_weather</a> Cancellations, please see <a target=\"_blank\" href=\"http://www.twitter.com/SACWetWeather\">twitter page</a> for further news.</li>\r\n<li><a target=\"_blank\" href=\"/Intranet/students/news_resources/2011/Term2/2011_Football_round_2.pdf\">2011 Football draw Round 2</a></li>\r\n<li><a target=\"_blank\" href=\"/Intranet/students/news_resources/2011/Term2/2011_Rugby_round_2.pdf\">2011 Rugby draw Round 2</a></li></ul>\r\n<p></p></div>", "Category": {
"__deferred": {
"uri": "/Students/news/_vti_bin/ListData.svc/Posts(4)/Category"
}
}, "Published": "\/Date(1308342960000)\/", "ContentTypeID": "0x0110001F9F7104FDD3054AAB40D8561196E09E", "ApproverComments": null, "Comments": {
"__deferred": {
"uri": "/_vti_bin/ListData.svc/Posts(4)/Comments"
}
}, "CommentsId": 0, "ApprovalStatus": "0", "Id": 4, "ContentType": "Post", "Modified": "\/Date(1309122092000)\/", "Created": "\/Date(1309120597000)\/", "CreatedBy": {
"__deferred": {
"uri": "/Students/news/_vti_bin/ListData.svc/Posts(4)/CreatedBy"
}
}, "CreatedById": 1, "ModifiedBy": {
"__deferred": {
"uri": "/Students/news/_vti_bin/ListData.svc/Posts(4)/ModifiedBy"
}
}, "ModifiedById": 1, "Owshiddenversion": 2, "Version": "1.0", "Path": "/Students/news/Lists/Posts"
},

I can't wrap my head around editing this. Converting it to a python dictionary seems to jumble the order of the attributes up, making it impossible for me to find where one object starts and another begins. What's the best way for me to extract just the 'title', 'body' and 'published' keys and values, and how would I do it for multiple objects?

Upvotes: 1

Answers (2)

Finglas

Reputation: 15707

import json

obj = json.loads(json_input)

for record in obj:
    print obj["title"]
    print obj["body"]
    print obj["published"]

Presuming that json_input is the above snippet, in string form, or already read in via a file. Also note, I presumed the above snippet was to be a collection based on your question.

Update

Based on the example, you have another layer that was not present in the snippet originally posted.

Change the loop to be:

for record in obj["d"]["results"]:
    ...

Upvotes: 1

Hubro

Reputation: 59388

I'm assuming your main JSON object is an array of those objects. Here's how I'd print out the information you're after:

import json

main_array = json.load('my_json_file.json')

for sub_object in main_array:
    print "Title: {}\nBody: {}\nPublished: {}\n".format(
        sub_object['Title'], sub_object['Body'], sub_object['Published']
    )

Upvotes: 1

Extracting from a very complex JSON file in Python

Answers (2)

Related Questions