Reputation: 497
I ha ve a file containing a list of tweet IDs and I want to retrieve those tweets. The file contains more than 100000 tweets and the twitter API allows to retrieve only 100.
api = tweepy.API(auth)
good_tweet_ids = [i for i in por.TweetID[0:100]]
tweets = api.statuses_lookup(good_tweet_ids)
for tweet in tweets:
print(tweet.text)
Is there a way to retrieve more tweets say 1000 or 2000, I don't want to take a sample of the data and save the results to a file and change the index of the tweet ID every time so is there a way to do that !?
Upvotes: 0
Views: 8016
Reputation: 21
Addition to the code above. The output format if the tweet is a twitter status object. The following piece of code will convert it into a sterilizable json and then map it to the tweet id to get a full df.
df = pd.read_csv('your.csv')
good_tweet_ids = [i for i in df.TweetID] #tweet ids to look up
results = lookup_tweets(good_tweet_ids, api) #apply function
#Wrangle the data into one dataframe
import json
temp = json.dumps([status._json for status in results]) #create JSON
newdf = pd.read_json(temp, orient='records')
full = pd.merge(df, newdf, left_on='TweetID', right_on='id', how='left').drop('id', axis=1)
Upvotes: 2
Reputation: 10359
Yes - twitter only lets you lookup 100 tweets at a time, but you can look up another 100 immediately after that. The only concern then is rate limits - you are restricted by the number of calls that you can make to the API in each 15 minute window. Fortunately, tweepy is able to handle this gracefully when you create the API by using wait_on_rate_limit=True
. All we need to do, then, is process our full list of tweet IDs into batches of 100 or fewer (suppose you have 130 - the second batch should only be the final 30) and look them up one at a time. Try the following:
import tweepy
def lookup_tweets(tweet_IDs, api):
full_tweets = []
tweet_count = len(tweet_IDs)
try:
for i in range((tweet_count / 100) + 1):
# Catch the last group if it is less than 100 tweets
end_loc = min((i + 1) * 100, tweet_count)
full_tweets.extend(
api.statuses_lookup(id=tweet_IDs[i * 100:end_loc])
)
return full_tweets
except tweepy.TweepError:
print 'Something went wrong, quitting...'
consumer_key = 'XXX'
consumer_secret = 'XXX'
access_token = 'XXX'
access_token_secret = 'XXX'
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
# do whatever it is to get por.TweetID - the list of all IDs to look up
results = lookup_tweets(por.TweetID, api)
for tweet in results:
if tweet:
print tweet.text
Upvotes: 10