Reputation: 21
If I run my code without the final line: getVal(tweet['retweeted_status']['favorite_count']),
then the scrape works but when I add this line I get an error message KeyError: 'retweeted_status'
Does anyone know what I'm doing wrong?
q = "David_Cameron"
results = twitter_user_timeline(twitter_api, q)
print len(results)
# Show one sample search result by slicing the list...
# print json.dumps(results[0], indent=1)
csvfile = open(q + '_timeline.csv', 'w')
csvwriter = csv.writer(csvfile)
csvwriter.writerow(['created_at',
'user-screen_name',
'text',
'coordinates lng',
'coordinates lat',
'place',
'user-location',
'user-geo_enabled',
'user-lang',
'user-time_zone',
'user-statuses_count',
'user-followers_count',
'user-created_at'])
for tweet in results:
csvwriter.writerow([tweet['created_at'],
getVal(tweet['user']['screen_name']),
getVal(tweet['text']),
getLng(tweet['coordinates']),
getLat(tweet['coordinates']),
getPlace(tweet['place']),
getVal(tweet['user']['location']),
getVal(tweet['user']['geo_enabled']),
getVal(tweet['user']['lang']),
getVal(tweet['user']['time_zone']),
getVal(tweet['user']['statuses_count']),
getVal(tweet['user']['followers_count']),
getVal(tweet['user']['created_at']),
getVal(tweet['retweeted_status']['favorite_count']),
])
print "done"
Upvotes: 2
Views: 686
Reputation: 21
FYI, for anyone who sees this in the future... I managed to get the code to work using the below. getVal(tweet['favorite_count']) gives the favorite count for a tweet.
q = "SkyNews"
results = twitter_user_timeline(twitter_api, q)
csvfile = open(q + '_timeline.csv', 'w')
csvwriter = csv.writer(csvfile)
csvwriter.writerow(['created_at',
'user-screen_name',
'text',
'language',
'coordinates lng',
'coordinates lat',
'place',
'user-location',
'user-geo_enabled',
'user-lang',
'user-time_zone',
'user-statuses_count',
'user-followers_count',
'user-friend_count',
'user-created_at',
'favorite_count',
'retweet_count',
'user-mentions',
'urls',
'hashtags',
'symbols'])
for tweet in results:
csvwriter.writerow([tweet['created_at'],
getVal(tweet['user']['screen_name']),
getVal(tweet['text']),
getVal(tweet['lang']),
getLng(tweet['coordinates']),
getLat(tweet['coordinates']),
getPlace(tweet['place']),
getVal(tweet['user']['location']),
getVal(tweet['user']['geo_enabled']),
getVal(tweet['user']['lang']),
getVal(tweet['user']['time_zone']),
getVal(tweet['user']['statuses_count']),
getVal(tweet['user']['followers_count']),
getVal(tweet['user']['friends_count']),
getVal(tweet['user']['created_at']),
getVal(tweet['favorite_count']),
getVal(tweet['retweet_count']),
tweet['entities']['user_mentions'],
tweet['entities']['urls'],
tweet['entities']['hashtags'],
tweet['entities']['symbols'],
])
print "done"
where getVal, getLng and getLat are defined earlier in the code by:
def getVal(val):
clean = ""
if isinstance(val, bool):
return val
if isinstance(val, int):
return val
if val:
clean = val.encode('utf-8')
return clean
def getLng(val):
if isinstance(val, dict):
return val['coordinates'][0]
def getLat(val):
if isinstance(val, dict):
return val['coordinates'][1]
def getPlace(val):
if isinstance(val, dict):
return val['full_name'].encode('utf-8')
Upvotes: 0
Reputation: 74
According to the API over at https://dev.twitter.com/overview/api/tweets this attribute may or may not exist.
If it does not exist you will not be able to access the attribute. You can either make a safe lookup using the in operator to access it by checking existence first
retweeted_favourite_count = tweet['retweeted_status']['favourite_count'] if 'retweeted_status' in tweet else None
or doing the way of assuming it is there but handle when it is not
try:
retweeted_favourite_count = tweet['retweeted_status']['favourite_count']
except KeyError:
retweeted_favourite_count = 0
Then assign the retweeted_favourite_count value in write row function.
Also your CSV header row is lacking a description for retweeted favourite count
Updated example:
for tweet in results:
#Notice this is one long line not two rows.
retweeted_favourite_count = tweet['retweeted_status']['favourite_count'] if 'retweeted_status' in tweet else None
csvwriter.writerow([tweet['created_at'],
getVal(tweet['user']['screen_name']),
getVal(tweet['text']),
getLng(tweet['coordinates']),
getLat(tweet['coordinates']),
getPlace(tweet['place']),
getVal(tweet['user']['location']),
getVal(tweet['user']['geo_enabled']),
getVal(tweet['user']['lang']),
getVal(tweet['user']['time_zone']),
getVal(tweet['user']['statuses_count']),
getVal(tweet['user']['followers_count']),
getVal(tweet['user']['created_at']),
# And insert it here instead
getVal(retweeted_favourite_count),
])
You coulse also switch the line:
getVal(tweet['retweeted_status']['favorite_count'])
With as Padriac Cunningham suggested
getVal(tweet.get('retweeted_status', {}).get('favourite_count', None)
Upvotes: 2