amarokWPcom
amarokWPcom

Reputation: 81

How can I scrape multiple followers with Tweepy?

The Twitter API limit lets me only scrape 15-20 pages with 200 followers. After a 15 min break limit is set.

If I run my code new I don't want to start again at the beginning of a follower list, how can I step into where I left before the limitation?

My code looks like this with a Twitter user example who has 4.8 Mill follower.

def fetch_followers_to_df():
try:
    screen_name = "JeffBezos"
    
    # fetching the user
    user = api.get_user(screen_name=screen_name)
    num_followers = user.followers_count
    print(user.id)
    print(user.followers_count)
    
    cnt_followers = input()
    if cnt_followers == "":
        cnt_pages = float(num_followers) // 200
    print(cnt_pages)
    
    df_followers = pd.DataFrame(columns=["User", "ID", "Bio", "Location"])

    for page in tweepy.Cursor(
        api.get_followers, screen_name=screen_name, count=200
    ).pages(cnt_pages):
        for user in page:

            temp_dict = {
                "User": user.screen_name,
                "ID": user.id,
                "Bio": user.description,
                "Location": user.location,
            }
            temp_df = pd.DataFrame(data=temp_dict, index=[1])
            df_followers = pd.concat([df_followers, temp_df])

        print(len(page))

    df_followers.to_csv("raw_followers.csv", index=False)

except:
    print("Wait for new API Limit!")
    df_followers = pd.read_csv("raw_followers.csv")

return df_followers

Upvotes: 0

Views: 372

Answers (1)

Mickaël Martinez
Mickaël Martinez

Reputation: 1843

Set the wait_on_rate_limit argument to True when you initialize API (see here).

On a side note, you don't need the cnt_pages argument for pages (default is inf), and you would go much faster by using the methods of the Twitter API V2.

Upvotes: 1

Related Questions