Matt M
Matt M

Reputation: 309

Dealing With Connection Break in For Loop, Wrong Behaviour

I have the following For Loop which grabs the followers ids for a series of users using Tweepy:

def download_followers(user, api):
    all_followers = []
    try:
        for page in tweepy.Cursor(api.followers_ids, screen_name=user).pages():
            all_followers.extend(map(str, page))
        return all_followers
    except tweepy.TweepError:
        print('Could not access user {}. Skipping...'.format(user))

The function is called in the following way:

for username in lookup_users:
    user_followers = download_followers(username, main_api)
    if user_followers:

        new_followers = pd.DataFrame({
            "Handles": username,
            "Follower_ID": user_followers,
            "Start_Date": today})

        new_followers_df = new_followers_df.append(new_followers)


        print('Finished outputting: {} at {}'.format(username, datetime.now().strftime('%Y/%m/%d %H:%M:%S')))

Depending on the amount of followers each user might have, Twitter's API might have to be called twice or three times to grab all the user's followers.

Accordingly, there is a rest of 15 minutes before another call is made to the api. This is dealt by adding the following parameter to Tweepy:

main_api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)

The result is something like this:

Rate limit reached. Sleeping for: 895
Rate limit reached. Sleeping for: 895
Finished outputting: @barackobama at 2017/07/01 10:36:07

Whereby the API in this case reached its limit twice. Waiting 15 minutes each time before grabbing all of @barackobama's followers.

However, sometimes the for loop fails. Printing out the message:

    'Could not access user @barackobama. Skipping...'

This is mainly due to either a connection problem, the twitter api not being sent the right request, or an account having a lot of followers and Tweepy's package not being able to deal with it accordingly.

To account for a possible connection failure I tried wrapping the api in a While True argument as in the following way:

 def download_followers(user, api):
    all_followers = []
    while True:

        try:

            for page in tweepy.Cursor(api.followers_ids, screen_name=user).pages():

                all_followers.extend(map(str, page))

                return all_followers

        except tweepy.TweepError:
            print('Could not access user {}. Trying Again...'.format(user))
            continue
        break

However, by wrapping the function this way, the for loop is not working properly. Iterating over each user just once, not grabbing all its followers, and moving on to the next user in the `lookup_user list.

For example, instead of behaving the following way:

Rate limit reached. Sleeping for: 895
'Could not access user @barackobama. Trying again...'
Rate limit reached. Sleeping for: 895
Finished outputting: @barackobama at 2017/07/01 10:36:07
Rate limit reached. Sleeping for: 895
Rate limit reached. Sleeping for: 895
Rate limit reached. Sleeping for: 895
Finished outputting: @donaldtrump at 2017/07/01 10:36:07

It acts the following way:

Finished outputting: @barackobama at 2017/07/01 10:36:07
Finished outputting: @donaldtrump at 2017/07/01 10:36:07
Finished outputting: @georgebush at 2017/07/01 10:36:07
Rate limit reached. Sleeping for: 895
Finished outputting: @richardnixon at 2017/07/01 10:41:08

Hence iterating over each user only once.

Is there something I am doing wrong?

Upvotes: 0

Views: 78

Answers (1)

domi
domi

Reputation: 566

The return statement is inside the for loop, so the program exits the for loop after the first iteration.

Upvotes: 1

Related Questions