sharataka
sharataka

Reputation: 5132

How to fix a TypeError in django related to parsing JSON?

I get a TypeError displaying in the browser when I run the code below. The error comes at the last line and says 'NoneType' object is not subscriptable (I am trying to get all the urls for all the items). However it is odd because in the command line, all the urls in the feed get printed. Any ideas on why the items are getting printed in the command line but showing an error in the browser? How do I fix this?

#reddit parse
try:
    f = urllib.urlopen("http://www.reddit.com/r/videos/top/.json");
except Exception:
    print("ERROR: malformed JSON response from reddit.com")
reddit_posts = json.loads(f.read().decode("utf-8"))["data"]["children"]
reddit_feed=[]
for post in reddit_posts:
    if "oembed" in post['data']['media']:
        print post["data"]["media"]["oembed"]["url"]
        reddit_feed.append(post["data"]["media"]["oembed"]["url"])  
print reddit_feed

edit

if post["data"]["media"]["oembed"]["url"]:
    print post["data"]["media"]["oembed"]["url"]

Upvotes: 1

Views: 402

Answers (1)

K Z
K Z

Reputation: 30453

There are posts in the returned json with media=null so post['data']['media'] will not have oembed field (and hence, url field):

     {
        "kind" : "t3",
        "data" : {
           "downs" : 24050,
           "link_flair_text" : null,
           "media" : null,
           "url" : "http://youtu.be/aNJgX3qH148?t=4m20s",
           "link_flair_css_class" : null,
           "id" : "rymif",
           "edited" : false,
           "num_reports" : null,
           "created_utc" : 1333847562,
           "banned_by" : null,
           "name" : "t3_rymif",
           "subreddit" : "videos",
           "title" : "An awesome young man",
           "author_flair_text" : null,
           "is_self" : false,
           "author" : "Lostinfrustration",
           "media_embed" : {},
           "permalink" : "/r/videos/comments/rymif/an_awesome_young_man/",
           "author_flair_css_class" : null,
           "selftext" : "",
           "domain" : "youtu.be",
           "num_comments" : 2260,
           "likes" : null,
           "clicked" : false,
           "thumbnail" : "http://a.thumbs.redditmedia.com/xUDtCtRFDRAP5gQr.jpg",
           "saved" : false,
           "ups" : 32312,
           "subreddit_id" : "t5_2qh1e",
           "approved_by" : null,
           "score" : 8262,
           "selftext_html" : null,
           "created" : 1333847562,
           "hidden" : false,
           "over_18" : false
        }
     },

It also seems to be that your exception message doesn't really fit: there are many kinds of exceptions that can be thrown when urlopen blows up, such as IOError. It does not check on whether the returned format is valid JSON as your error message imply.

Now, to mitigate the problem, you need to check if "oembed" in post['data']['media'], and only if it does can you call post['data']['media']['oembed']['url'], notice that I am making the assumption that all oembed blob has url (mainly because you need an URL to embed a media on reddit).

**UPDATE: Namely, something like this should fix your problem:

for post in reddit_posts:
    if isinstance(post['data']['media'], dict) \
           and "oembed" in post['data']['media'] \
           and isinstance(post['data']['media']['oembed'], dict) \
           and 'url' in post['data']['media']['oembed']:
        print post["data"]["media"]["oembed"]["url"]
        reddit_feed.append(post["data"]["media"]["oembed"]["url"])
print reddit_feed

The reason you have that error is because for some post, post["data"]["media"] is None and so you are basically calling None["oembed"] here. And hence the error: 'NoneType' object is not subscriptable. I've also realized that post['data']['media']['oembed'] may not be a dict and hence you will also need to verify if it is a dict and if url is in it.

Update 2:

It looks like data won't exist sometimes either, so the fix:

import json
import urllib

try:
    f = urllib.urlopen("http://www.reddit.com/r/videos/top/.json")
except Exception:
    print("ERROR: malformed JSON response from reddit.com")
reddit_posts = json.loads(f.read().decode("utf-8"))

if isinstance(reddit_posts, dict) and "data" in reddit_posts \
   and isinstance(reddit_posts['data'], dict) \
   and 'children' in reddit_posts['data']:
    reddit_posts = reddit_posts["data"]["children"]
    reddit_feed = []
    for post in reddit_posts:
        if isinstance(post['data']['media'], dict) \
               and "oembed" in post['data']['media'] \
               and isinstance(post['data']['media']['oembed'], dict) \
               and 'url' in post['data']['media']['oembed']:
            print post["data"]["media"]["oembed"]["url"]
            reddit_feed.append(post["data"]["media"]["oembed"]["url"])
    print reddit_feed

Upvotes: 2

Related Questions