user4803385
user4803385

Reputation: 13

KeyErrors while reading Twitter json files in Python

I am trying to analyze a json file with data I have collected from twitter, but when I try to search for a keyword it says it is not found, but I can see it is there. I tried this two different ways. I'll post them below. Any advice would be great.

Attempt #1:

import sys
import os
import numpy as np
import scipy
import matplotlib.pyplot as plt
import json
import pandas as pan

tweets_file = open('twitter_data.txt', "r")
for line in tweets_file:
     try:
            tweet = json.loads(line)
            tweets_data.append(tweet)
     except:
            continue
tweets = pan.DataFrame()
tweets['text'] = map(lambda tweet: tweet['text'], tweets_data)

Attempt #2: Same previous steps, but did a loop instead

t=tweets[0]
tweet_text = [t['text'] for t in tweets]

Error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <lambda>
KeyError: 'text'

If I print tweets_data, this I is what I see. 'text',etc, is definitely, there. Am I missing a character?

>>> print(tweet_data[0])   
    {u'contributors': None, u'truncated': False, u'text': u'RT
    @iHippieVibes: \u2b50\ufe0fFAV For This Lace Cardigan \n\nUSE Discount
    code for 10% off: SOLO\n\nFree Shipping\n\nhttp://t.co/d8kiIt3J5f
    http://t.c\u2026', u'in_reply_to_status....

(pasted only part of the output)

Thanks! Any suggestions would be greatly appreciated.

Upvotes: 1

Views: 735

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1121296

Not all your tweets have a 'text' key. Filter those out or use dict.get() to return a default:

tweet_text = [t['text'] for t in tweets if 'text' in t]

or

tweet_text = [t.get('text', '') for t in tweets]

Upvotes: 2

Related Questions