Muhammed Eltabakh
Muhammed Eltabakh

Reputation: 497

retrieving a list of tweets using tweet ID in tweepy

I ha ve a file containing a list of tweet IDs and I want to retrieve those tweets. The file contains more than 100000 tweets and the twitter API allows to retrieve only 100.

api = tweepy.API(auth)
good_tweet_ids = [i for i in por.TweetID[0:100]]
tweets = api.statuses_lookup(good_tweet_ids)
for tweet in tweets:
    print(tweet.text)

Is there a way to retrieve more tweets say 1000 or 2000, I don't want to take a sample of the data and save the results to a file and change the index of the tweet ID every time so is there a way to do that !?

Upvotes: 0

Views: 8016

Answers (2)

nacoder
nacoder

Reputation: 21

Addition to the code above. The output format if the tweet is a twitter status object. The following piece of code will convert it into a sterilizable json and then map it to the tweet id to get a full df.

df = pd.read_csv('your.csv')
good_tweet_ids = [i for i in df.TweetID] #tweet ids to look up 
results = lookup_tweets(good_tweet_ids, api) #apply function

#Wrangle the data into one dataframe
import json
temp = json.dumps([status._json for status in results]) #create JSON
newdf = pd.read_json(temp, orient='records')
full = pd.merge(df, newdf, left_on='TweetID', right_on='id', how='left').drop('id', axis=1)

Upvotes: 2

asongtoruin
asongtoruin

Reputation: 10359

Yes - twitter only lets you lookup 100 tweets at a time, but you can look up another 100 immediately after that. The only concern then is rate limits - you are restricted by the number of calls that you can make to the API in each 15 minute window. Fortunately, tweepy is able to handle this gracefully when you create the API by using wait_on_rate_limit=True. All we need to do, then, is process our full list of tweet IDs into batches of 100 or fewer (suppose you have 130 - the second batch should only be the final 30) and look them up one at a time. Try the following:

import tweepy


def lookup_tweets(tweet_IDs, api):
    full_tweets = []
    tweet_count = len(tweet_IDs)
    try:
        for i in range((tweet_count / 100) + 1):
            # Catch the last group if it is less than 100 tweets
            end_loc = min((i + 1) * 100, tweet_count)
            full_tweets.extend(
                api.statuses_lookup(id=tweet_IDs[i * 100:end_loc])
            )
        return full_tweets
    except tweepy.TweepError:
        print 'Something went wrong, quitting...'

consumer_key = 'XXX'
consumer_secret = 'XXX'
access_token = 'XXX'
access_token_secret = 'XXX'

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)

# do whatever it is to get por.TweetID - the list of all IDs to look up

results = lookup_tweets(por.TweetID, api)

for tweet in results:
    if tweet:
        print tweet.text

Upvotes: 10

Related Questions