Chelsea88
Chelsea88

Reputation: 21

How do I scrape the favorite_count of a tweet on someone's timeline?

If I run my code without the final line: getVal(tweet['retweeted_status']['favorite_count']), then the scrape works but when I add this line I get an error message KeyError: 'retweeted_status'

Does anyone know what I'm doing wrong?

q = "David_Cameron"
results = twitter_user_timeline(twitter_api, q)
print len(results)
# Show one sample search result by slicing the list...
# print json.dumps(results[0], indent=1)
csvfile = open(q + '_timeline.csv', 'w')
csvwriter = csv.writer(csvfile)
csvwriter.writerow(['created_at',
                'user-screen_name',
                'text',
                'coordinates lng',
                'coordinates lat',
                'place',
                'user-location',
                'user-geo_enabled',
                'user-lang',
                'user-time_zone',
                'user-statuses_count',
                'user-followers_count',
                'user-created_at'])
for tweet in results:
    csvwriter.writerow([tweet['created_at'],
                    getVal(tweet['user']['screen_name']),
                    getVal(tweet['text']),
                    getLng(tweet['coordinates']),
                    getLat(tweet['coordinates']),
                    getPlace(tweet['place']),
                    getVal(tweet['user']['location']),
                    getVal(tweet['user']['geo_enabled']),
                    getVal(tweet['user']['lang']),
                    getVal(tweet['user']['time_zone']),
                    getVal(tweet['user']['statuses_count']),
                    getVal(tweet['user']['followers_count']),
                    getVal(tweet['user']['created_at']), 
                    getVal(tweet['retweeted_status']['favorite_count']),
                    ])
print "done"

Upvotes: 2

Views: 686

Answers (2)

Chelsea88
Chelsea88

Reputation: 21

FYI, for anyone who sees this in the future... I managed to get the code to work using the below. getVal(tweet['favorite_count']) gives the favorite count for a tweet.

q = "SkyNews"
results = twitter_user_timeline(twitter_api, q)
csvfile = open(q + '_timeline.csv', 'w')
csvwriter = csv.writer(csvfile)
csvwriter.writerow(['created_at',
                'user-screen_name',
                'text',
                'language',
                'coordinates lng',
                'coordinates lat',
                'place',
                'user-location',
                'user-geo_enabled',
                'user-lang',
                'user-time_zone',
                'user-statuses_count',
                'user-followers_count',
                'user-friend_count',
                'user-created_at', 
                'favorite_count',
                'retweet_count',
                'user-mentions',
                'urls',
                'hashtags',
                'symbols'])
 for tweet in results:
    csvwriter.writerow([tweet['created_at'],
                    getVal(tweet['user']['screen_name']),
                    getVal(tweet['text']),
                    getVal(tweet['lang']),
                    getLng(tweet['coordinates']),
                    getLat(tweet['coordinates']),
                    getPlace(tweet['place']),
                    getVal(tweet['user']['location']),
                    getVal(tweet['user']['geo_enabled']),
                    getVal(tweet['user']['lang']),
                    getVal(tweet['user']['time_zone']),
                    getVal(tweet['user']['statuses_count']),
                    getVal(tweet['user']['followers_count']),
                    getVal(tweet['user']['friends_count']),
                    getVal(tweet['user']['created_at']), 
                    getVal(tweet['favorite_count']),
                    getVal(tweet['retweet_count']),
                    tweet['entities']['user_mentions'],
                    tweet['entities']['urls'],
                    tweet['entities']['hashtags'],
                    tweet['entities']['symbols'],
                    ])

print "done"

where getVal, getLng and getLat are defined earlier in the code by:

def getVal(val):
    clean = ""
    if isinstance(val, bool):
        return val
    if isinstance(val, int):
         return val
    if val:
         clean = val.encode('utf-8') 
    return clean

def getLng(val):
     if isinstance(val, dict):
         return val['coordinates'][0]

def getLat(val):
     if isinstance(val, dict):
        return val['coordinates'][1]

def getPlace(val):
    if isinstance(val, dict):
        return val['full_name'].encode('utf-8')

Upvotes: 0

Jon Warghed
Jon Warghed

Reputation: 74

According to the API over at https://dev.twitter.com/overview/api/tweets this attribute may or may not exist.

If it does not exist you will not be able to access the attribute. You can either make a safe lookup using the in operator to access it by checking existence first

retweeted_favourite_count = tweet['retweeted_status']['favourite_count'] if 'retweeted_status' in tweet else None

or doing the way of assuming it is there but handle when it is not

try: retweeted_favourite_count = tweet['retweeted_status']['favourite_count'] except KeyError: retweeted_favourite_count = 0

Then assign the retweeted_favourite_count value in write row function.

Also your CSV header row is lacking a description for retweeted favourite count

Updated example: for tweet in results: #Notice this is one long line not two rows. retweeted_favourite_count = tweet['retweeted_status']['favourite_count'] if 'retweeted_status' in tweet else None csvwriter.writerow([tweet['created_at'], getVal(tweet['user']['screen_name']), getVal(tweet['text']), getLng(tweet['coordinates']), getLat(tweet['coordinates']), getPlace(tweet['place']), getVal(tweet['user']['location']), getVal(tweet['user']['geo_enabled']), getVal(tweet['user']['lang']), getVal(tweet['user']['time_zone']), getVal(tweet['user']['statuses_count']), getVal(tweet['user']['followers_count']), getVal(tweet['user']['created_at']), # And insert it here instead getVal(retweeted_favourite_count), ])

You coulse also switch the line:

getVal(tweet['retweeted_status']['favorite_count'])

With as Padriac Cunningham suggested

getVal(tweet.get('retweeted_status', {}).get('favourite_count', None)

Upvotes: 2

Related Questions