user2030461
user2030461

Reputation:

Trying to parse data from twitter API

I'm having a lot of trouble reading twitter data into python. I have tweets in the following output http://pastebin.com/b4ZAUPsY.

I have attempted to load each tweet using JSON.loads() in python but I keep hitting errors. JSON valueError feedback is not specific enough to point me to whats wrong and I have been struggling to find the error by eye.

I also tried ast.literal_eval() hoping I could load the data directly as a dictionary but I also had trouble getting this idea to work.

I would really appreciate any advice on what to do!

Upvotes: 1

Views: 127

Answers (2)

Stephen Briney
Stephen Briney

Reputation: 937

This is not valid JSON. One of the issues you are having is related to the value of None.

"contributors": None
  • The None should be changed to null (without quotes).
  • The strings should not be prefixed with 'u'.
  • True and False should be true and false (without quotes).

See Wikipedia https://en.m.wikipedia.org/wiki/JSON

The data you have is almost valid python and can be parsed with the following code:

import re

a = '{u"contributors": None, u"truncated": False, u"text": u"Uber Germany retreats to Berlin, Munich https://t.co/OUTjo2vMgb", u"is_quote_status": False, u"in_reply_to_status_id": None, u"id": 660902084456288256L, u"favorite_count": 0, u"source": u"<a href="http://www.snsanalytics.com" rel="nofollow">SNS Analytics</a>", u"retweeted": False, u"coordinates": None, u"timestamp_ms": u"1446406310558", u"entities": {u"user_mentions": [], u"symbols": [], u"hashtags": [], u"urls": [{u"url": u"https://t.co/OUTjo2vMgb", u"indices": [40, 63], u"expanded_url": u"http://www.snsanalytics.com/iV9Oy0", u"display_url": u"snsanalytics.com/iV9Oy0"}]}, u"in_reply_to_screen_name": None, u"id_str": u"660902084456288256", u"retweet_count": 0, u"in_reply_to_user_id": None, u"favorited": False, u"user": {u"follow_request_sent": None, u"profile_use_background_image": True, u"default_profile_image": False, u"id": 119396644, u"verified": False, u"profile_image_url_https": u"https://pbs.twimg.com/profile_images/1225936492/Munich_normal.jpg", u"profile_sidebar_fill_color": u"DDEEF6", u"profile_text_color": u"333333", u"followers_count": 3701, u"profile_sidebar_border_color": u"C0DEED", u"id_str": u"119396644", u"profile_background_color": u"C0DEED", u"listed_count": 59, u"profile_background_image_url_https": u"https://pbs.twimg.com/profile_background_images/197414716/munich_places.jpg", u"utc_offset": 3600, u"statuses_count": 29594, u"description": None, u"friends_count": 397, u"location": u"Munich, Germany", u"profile_link_color": u"0084B4", u"profile_image_url": u"http://pbs.twimg.com/profile_images/1225936492/Munich_normal.jpg", u"following": None, u"geo_enabled": False, u"profile_background_image_url": u"http://pbs.twimg.com/profile_background_images/197414716/munich_places.jpg", u"name": u"Munich Daily", u"lang": u"en", u"profile_background_tile": True, u"favourites_count": 0, u"screen_name": u"MunichDaily", u"notifications": None, u"url": None, u"created_at": u"Wed Mar 03 14:31:12 +0000 2010", u"contributors_enabled": False, u"time_zone": u"Amsterdam", u"protected": False, u"default_profile": False, u"is_translator": False}, u"geo": None, u"in_reply_to_user_id_str": None, u"possibly_sensitive": False, u"lang": u"en", u"created_at": u"Sun Nov 01 19:31:50 +0000 2015", u"filter_level": u"low", u"in_reply_to_status_id_str": None, u"place": None}'
a = re.sub(', u"source": u"<a href=', ', u"source": ', a)
a = re.sub(' rel="nofollow">SNS Analytics</a>",', ',', a)
a = eval(a)

The reason it is not quite python syntax is because of this part: -

u"source": u"<a href="http://www.snsanalytics.com" rel="nofollow">SNS Analytics</a>"

The html hyperlink tag that is included into this string also contains quotes that are not escaped.

The code above converts this to: -

u"source": u"http://www.snsanalytics.com"

Upvotes: 1

keda
keda

Reputation: 569

Your JSON is not valid.

Issues:

  • None should become null
  • True should become true
  • False should become false
  • URLs cannot have double-quotes within them. Change them to single-quotes or escape.
    • "source": "<a href="http://www.snsanalytics.com" rel="nofollow">SNS Analytics</a>" should become "source": "<a href='http://www.snsanalytics.com' rel='nofollow'>SNS Analytics</a>"
  • You have a long in there that ends in an L - 660902084456288256L. Remove the L and make it just 660902084456288256.
  • Also, when you parse it, make sure there are no u's in front of any strings, but this may be just because of how it printed out unicode, so just make sure.

Here is the valid JSON: http://pastebin.com/tqGscNhA

In the future, you can use JSONLint to validate your data: http://jsonlint.com/

Check out http://json.org/. On the right side there is a white rectangular focus block that specifies the correct syntax and all valid types.

Upvotes: 0

Related Questions