TaihouKai
TaihouKai

Reputation: 371

Length of timeline() of Twitter API

I am trying to get all tweets from a specific user:

def get_all_tweets(user_id, DEBUG):
    # Your bearer token here
    t = Twarc2(bearer_token="blah")

    # Initialize a list to hold all the tweepy Tweets
    alltweets = []
    new_tweets = {}

    if DEBUG:
        # Debug: read from file
        f = open('tweets_debug.txt',)
        new_tweets = json.load(f)
        alltweets.extend(new_tweets)
    else:
        # make initial request for most recent tweets (3200 is the maximum allowed count)
        new_tweets = t.timeline(user=user_id)
        # save most recent tweets
        alltweets.extend(new_tweets)

    if DEBUG:
        # Debug: write to file
        f = open("tweets_debug.txt", "w")
        f.write(json.dumps(alltweets, indent=2, sort_keys=False))
        f.close()

    # Save the id of the oldest tweet less one
    oldest = str(int(alltweets[-1]['meta']['oldest_id']) - 1)

    # Keep grabbing tweets until there are no tweets left to grab
    while len(dict(new_tweets)) > 0:
        print(f"getting tweets before {oldest}")
        
        # All subsiquent requests use the max_id param to prevent duplicates
        new_tweets = t.timeline(user=user_id,until_id=oldest)
        
        # Save most recent tweets
        alltweets.extend(new_tweets)
        
        # Update the id of the oldest tweet less one
        oldest = str(int(alltweets[-1]['meta']['oldest_id']) - 1)
        
        print(f"...{len(alltweets)} tweets downloaded so far")
    
    res = []
    for tweetlist in alltweets:
        res.extend(tweetlist['data'])
    
    f = open("output.txt", "w")
    f.write(json.dumps(res, indent=2, sort_keys=False))
    f.close()
    
    return res

However, len(dict(new_tweets)) does not work. It always returns 0. sum(1 for dummy in new_tweets) also returns 0.

I tried json.load(new_tweets) and it does not work as well.

However, alltweets.extend(new_tweets) worked properly.

It seems like timeline() returns a generator-type value (<generator object Twarc2._timeline at 0x000001D78B3D8B30>). Is there any way I can count its length to determine whether there are any more tweets un-grabbed?

Or, is there any way to merge...

someList = []
someList.extend(new_tweets)
while len(someList) > 0:
    # blah blah

...into one line with while?


Edit: I tried print(list(new_tweets)) right before the while loop, and it returns []. It seems like the object is actually empty.

Is it because alltweets.extend(new_tweets) somehow consumes the new_tweets generator...?

Upvotes: 1

Views: 158

Answers (1)

TaihouKai
TaihouKai

Reputation: 371

I figured it out myself. This problem can be solved by converting generator to list:

new_tweets = list(t.timeline(user=user_id,until_id=oldest))

Upvotes: 0

Related Questions