Reputation: 371
I am trying to get all tweets from a specific user:
def get_all_tweets(user_id, DEBUG):
# Your bearer token here
t = Twarc2(bearer_token="blah")
# Initialize a list to hold all the tweepy Tweets
alltweets = []
new_tweets = {}
if DEBUG:
# Debug: read from file
f = open('tweets_debug.txt',)
new_tweets = json.load(f)
alltweets.extend(new_tweets)
else:
# make initial request for most recent tweets (3200 is the maximum allowed count)
new_tweets = t.timeline(user=user_id)
# save most recent tweets
alltweets.extend(new_tweets)
if DEBUG:
# Debug: write to file
f = open("tweets_debug.txt", "w")
f.write(json.dumps(alltweets, indent=2, sort_keys=False))
f.close()
# Save the id of the oldest tweet less one
oldest = str(int(alltweets[-1]['meta']['oldest_id']) - 1)
# Keep grabbing tweets until there are no tweets left to grab
while len(dict(new_tweets)) > 0:
print(f"getting tweets before {oldest}")
# All subsiquent requests use the max_id param to prevent duplicates
new_tweets = t.timeline(user=user_id,until_id=oldest)
# Save most recent tweets
alltweets.extend(new_tweets)
# Update the id of the oldest tweet less one
oldest = str(int(alltweets[-1]['meta']['oldest_id']) - 1)
print(f"...{len(alltweets)} tweets downloaded so far")
res = []
for tweetlist in alltweets:
res.extend(tweetlist['data'])
f = open("output.txt", "w")
f.write(json.dumps(res, indent=2, sort_keys=False))
f.close()
return res
However, len(dict(new_tweets))
does not work. It always returns 0. sum(1 for dummy in new_tweets)
also returns 0.
I tried json.load(new_tweets)
and it does not work as well.
However, alltweets.extend(new_tweets)
worked properly.
It seems like timeline()
returns a generator-type value (<generator object Twarc2._timeline at 0x000001D78B3D8B30>
). Is there any way I can count its length to determine whether there are any more tweets un-grabbed?
Or, is there any way to merge...
someList = []
someList.extend(new_tweets)
while len(someList) > 0:
# blah blah
...into one line with while
?
Edit: I tried print(list(new_tweets))
right before the while loop, and it returns []
. It seems like the object is actually empty.
Is it because alltweets.extend(new_tweets)
somehow consumes the new_tweets generator...?
Upvotes: 1
Views: 158
Reputation: 371
I figured it out myself. This problem can be solved by converting generator to list:
new_tweets = list(t.timeline(user=user_id,until_id=oldest))
Upvotes: 0