Reputation: 31
Through the basic Academic Research Developer Account, I'm using the Tweepy API to collect tweets containing specified keywords or hashtags. This enables me to collect 10,000,000 tweets per month. Using the entire archive search, I'm trying to collect tweets from one whole calendar date at a time. I've gotten a rate limit error (despite the wait_on_rate_limit flag being set to true) Now there's an error with the request limit.
here is the code
import pandas as pd
import tweepy
# function to display data of each tweet
def printtweetdata(n, ith_tweet):
print()
print(f"Tweet {n}:")
print(f"Username:{ith_tweet[0]}")
print(f"tweet_ID:{ith_tweet[1]}")
print(f"userID:{ith_tweet[2]}")
print(f"creation:{ith_tweet[3]}")
print(f"location:{ith_tweet[4]}")
print(f"Total Tweets:{ith_tweet[5]}")
print(f"likes:{ith_tweet[6]}")
print(f"retweets:{ith_tweet[7]}")
print(f"hashtag:{ith_tweet[8]}")
# function to perform data extraction
def scrape(words, numtweet, since_date, until_date):
# Creating DataFrame using pandas
db = pd.DataFrame(columns=['username', 'tweet_ID', 'userID',
'creation', 'location', 'text','likes','retweets', 'hashtags'])
# We are using .Cursor() to search through twitter for the required tweets.
# The number of tweets can be restricted using .items(number of tweets)
tweets = tweepy.Cursor(api.search_full_archive,'research',query=words,
fromDate=since_date, toDate=until_date).items(numtweet)
# .Cursor() returns an iterable object. Each item in
# the iterator has various attributes that you can access to
# get information about each tweet
list_tweets = [tweet for tweet in tweets]
# Counter to maintain Tweet Count
i = 1
# we will iterate over each tweet in the list for extracting information about each tweet
for tweet in list_tweets:
username = tweet.user.screen_name
tweet_ID = tweet.id
userID= tweet.author.id
creation = tweet.created_at
location = tweet.user.location
likes = tweet.favorite_count
retweets = tweet.retweet_count
hashtags = tweet.entities['hashtags']
# Retweets can be distinguished by a retweeted_status attribute,
# in case it is an invalid reference, except block will be executed
try:
text = tweet.retweeted_status.full_text
except AttributeError:
text = tweet.text
hashtext = list()
for j in range(0, len(hashtags)):
hashtext.append(hashtags[j]['text'])
# Here we are appending all the extracted information in the DataFrame
ith_tweet = [username, tweet_ID, userID,
creation, location, text, likes,retweets,hashtext]
db.loc[len(db)] = ith_tweet
# Function call to print tweet data on screen
printtweetdata(i, ith_tweet)
i = i+1
filename = 'C:/Users/USER/Desktop/الجامعة الالمانية/output/twitter.csv'
# we will save our database as a CSV file.
db.to_csv(filename)
if __name__ == '__main__':
consumer_key = "####"
consumer_secret = "###"
access_token = "###"
access_token_secret = "###"
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth,wait_on_rate_limit=True)
since_date = '200701010000'
until_date = '202101012359'
words = "#USA"
# number of tweets you want to extract in one run
numtweet = 1000
scrape(words, numtweet, since_date, until_date)
print('Scraping has completed!')
I got this error:
TooManyRequests: 429 Too Many Requests
Request exceeds account’s current package request limits. Please upgrade your package and retry or contact Twitter about enterprise access.
Upvotes: 0
Views: 9935
Reputation: 91
Unfortunately, I believe this is due to the Sandbox quota. For a premium account it would be more. Tweepy API Documentation
You may check out this answer here - Limit
Upvotes: 0