A.S.J
A.S.J

Reputation: 637

tweepy: get all mentions with api.search using max_id and since_id

I followed this link here to get all tweets that mention a certain query. Now, the code works fine so far, I just want to make sure I actually understand anything since I don't want to use some code even though I don't even know how it does what it does. This is my relevant code:

def searchMentions (tweetCount, maxTweets, searchQuery, tweetsPerQry, max_id, sinceId) :

while tweetCount < maxTweets:

    if (not max_id):

        if (not sinceId):

            new_tweets = api.search(q=searchQuery, count=tweetsPerQry)

        else:
            new_tweets = api.search(q=searchQuery, count = tweetsPerQry, since_id = sinceId)

    else: 

        if (not sinceId):

            new_tweets = api.search(q=searchQuery, count= tweetsPerQry, max_id=str(max_id -1))

        else:
            new_tweets = api.search(q=searchQuery, count=tweetsPerQry, max_id=str(max_id -1), since_id=sinceId)

    if not new_tweets:
        print("No new tweets to show")
        break

    for tweet in new_tweets :

        try :
            tweetCount += len(new_tweets)
            max_id = new_tweets[-1].id

            tweetId = tweet.user.id
            username = tweet.user.screen_name
            api.update_status(tweet.text)
            print(tweet.text)

        except tweepy.TweepError as e:
            print(e.reason)

        except StopIteration:
            pass

max_id and sinceId are both set to None since no tweets have been found yet, I assume. tweetCount is set to zero. The way I understand it, is that the while-loop runs while tweetCount < maxTweets. I'm not exactly sure why that is the case and why I can't use while True, for instance. At first I thought maybe it has to do with the rate of api calls but that doesn't really make sense.

Afterwards, the function checks for max_id and sinceId. I assume it checks if there is already a max_id and if max_id is none, it checks for sinceId. If sinceId is none then it simply gets however many tweets the count parameter is set to, otherwise it sets the lower bound to sinceId and gets however many tweets the count parameter is set to from sinceId on. If max_id is not none, but if sinceId is set to none, it sets the upper limit to max_id and gets a certain number of tweets until and including that bound. So if you had tweets with the ids 1,2,3,4,5 and with count=3 and max_id=5 you would get the tweets 3,4,5. Otherwise it sets the lower bound to sinceId and the upper vound to max_id and gets the tweets "in between". Tweets that are found are saved in new_tweets.

Now, the function iterates through all tweets in new_tweets and sets the tweetCount to the length of this list. Then max_id is set to new_tweets[-1].id. Since twitter specifies that max_id is inclusive, I assume this is set to the next tweet before the last tweet so tweets aren't repeated, however, I'm not so sure about it and I don't understand how my function would know what the id before the last tweet could be. A tweet that repeats whatever the tweet in new_tweets said is posted. So, to sum it up, my questions are:

  1. Can I do while True instead of while tweetCount < maxTweets and if not, why?
  2. Is the way I explained the function correct, if not, where did I go wrong?
  3. What does max_id = new_tweets[-1].id do exactly?
  4. Why do we not set sinceId to a new value in the for-loop? Since sinceId is set to None in the beginning, it seems unnecessary to go through the options of sinceId not being set to None if we do not change the value anywhere.

As a disclaimer: I did read through twitters explantion explanation of max_id, since_id, counts, etc. but it did not answer my questions.

Upvotes: 0

Views: 2297

Answers (2)

Shashank Yadav
Shashank Yadav

Reputation: 204

A few months ago, i used the same reference for the Search API. I came to understand a few things that might help you. I have assumed that the API returns tweets in an orderly fashion (Descending order of tweet_id).

Let's assume we have a bunch of tweets ,that twitter is giving us for a query, with the tweet ids from 1 to 10 ( 1 being the oldest and 10 the newest ).

1 2 3 4 5 6 7 8 9 10

since_id = lower bound and max_id = upper bound

Twitter starts to return the tweets in the order of newest to oldest ( from 10 to 1 ). Let's take some examples:

# This would return tweets having id between 4 and 10 ( 4 and 10 inclusive )    
since_id=4,max_id=10

# This means there is no lower bound, and we will receive as many 
# tweets as the Twitter Search API permits for the free version ( i.e. for the last 7 
# days ). Hence, we will get tweets with id 1 to 10 ( 1 and 10 inclusive )
since_id=None, max_id=10

What does max_id = new_tweets[-1].id do exactly?

Suppose in the first API call we received 4 tweets only, i.e. 10, 9, 8, 7. Hence, the new_tweets list becomes( i am assuming it to be a list of ids for the purpose of explanation, it is actually a list of objects ) :

new_tweets=[10,9,8,7] 
max_id= new_tweets[-1]   # max_id = 7

Now when our program hits the API for the second time:

max_id = 7
since_id = None

new_tweets = api.search(q=searchQuery, count=tweetsPerQry, max_id=str(max_id -1), since_id=sinceId)

# We will receive all tweets from 6 to 1 now.
max_id = 6  # max_id=str(max_id -1)
#Therefore
new_tweets = [6,5,4,3,2,1]

This way of using the API ( as mentioned in the reference ) can return a maximum of 100 tweets, for every API call we make. The actual number of tweets returned is less than 100 and also depends on how complex your query is, the less complex, the better.

Why do we not set sinceId to a new value in the for-loop? Since sinceId is set to None in the beginning, it seems unnecessary to go through the options of sinceId not being set to None if we do not change the value anywhere.

Setting sinceId=None returns the oldest of the tweets, but i am unsure of what the default value of sinceId is, if we don't mention it.

Can I do while True instead of while tweetCount < maxTweets and if not, why?

You can do this, but you then need to handle the exceptions that you'll get for reaching the rate limit ( i.e. 100 tweets per call ). Using this makes the handling of the program easier.

I hope this helps you.

Upvotes: 0

JulianP
JulianP

Reputation: 97

Can I do while True instead of while tweetCount < maxTweets and if not, why?

It's been a while since I used the Twitter API but if I recall correctly, you have a limited amount of calls and tweets in an hour. This is to keep Twitter relatively clean. I recall maxTweets should be the amount you want to fetch. That's why you probably wouldn't want to use while True, but I believe you can replace it without any problems. You'll reach an exception eventually, that will be the API telling you you reached your max amount.

What does max_id = new_tweets[-1].id do exactly?

Every tweet has an ID, that's the one you see in the URL when you open it. You use it to reference a specific tweet in your code. What that code does is update the ID of the last tweet in the returned list to your last tweet's ID. (basically update the variable). Remember calling negative indexes refers to elements from the end of the list and backwards.

I am not 100% sure about your other two questions, I'll edit later if I find anything.

Upvotes: 0

Related Questions