Reputation: 309
I have the following For Loop which grabs the followers ids for a series of users using Tweepy
:
def download_followers(user, api):
all_followers = []
try:
for page in tweepy.Cursor(api.followers_ids, screen_name=user).pages():
all_followers.extend(map(str, page))
return all_followers
except tweepy.TweepError:
print('Could not access user {}. Skipping...'.format(user))
The function is called in the following way:
for username in lookup_users:
user_followers = download_followers(username, main_api)
if user_followers:
new_followers = pd.DataFrame({
"Handles": username,
"Follower_ID": user_followers,
"Start_Date": today})
new_followers_df = new_followers_df.append(new_followers)
print('Finished outputting: {} at {}'.format(username, datetime.now().strftime('%Y/%m/%d %H:%M:%S')))
Depending on the amount of followers each user
might have, Twitter's API
might have to be called twice or three times to grab all the user's followers
.
Accordingly, there is a rest of 15 minutes before another call is made to the api. This is dealt by adding the following parameter to Tweepy
:
main_api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
The result is something like this:
Rate limit reached. Sleeping for: 895
Rate limit reached. Sleeping for: 895
Finished outputting: @barackobama at 2017/07/01 10:36:07
Whereby the API
in this case reached its limit twice. Waiting 15 minutes each time before grabbing all of @barackobama
's followers.
However, sometimes the for loop
fails. Printing out the message:
'Could not access user @barackobama. Skipping...'
This is mainly due to either a connection problem, the twitter api not being sent the right request, or an account having a lot of followers and Tweepy's package not being able to deal with it accordingly.
To account for a possible connection failure I tried wrapping the api in a While True
argument as in the following way:
def download_followers(user, api):
all_followers = []
while True:
try:
for page in tweepy.Cursor(api.followers_ids, screen_name=user).pages():
all_followers.extend(map(str, page))
return all_followers
except tweepy.TweepError:
print('Could not access user {}. Trying Again...'.format(user))
continue
break
However, by wrapping the function this way, the for loop is not working properly. Iterating
over each user
just once, not grabbing all its followers, and moving on to the next user
in the `lookup_user list.
For example, instead
of behaving the following way:
Rate limit reached. Sleeping for: 895
'Could not access user @barackobama. Trying again...'
Rate limit reached. Sleeping for: 895
Finished outputting: @barackobama at 2017/07/01 10:36:07
Rate limit reached. Sleeping for: 895
Rate limit reached. Sleeping for: 895
Rate limit reached. Sleeping for: 895
Finished outputting: @donaldtrump at 2017/07/01 10:36:07
It acts the following way:
Finished outputting: @barackobama at 2017/07/01 10:36:07
Finished outputting: @donaldtrump at 2017/07/01 10:36:07
Finished outputting: @georgebush at 2017/07/01 10:36:07
Rate limit reached. Sleeping for: 895
Finished outputting: @richardnixon at 2017/07/01 10:41:08
Hence iterating over each user only once.
Is there something I am doing wrong?
Upvotes: 0
Views: 78
Reputation: 566
The return
statement is inside the for
loop, so the program exits the for
loop after the first iteration.
Upvotes: 1