Reputation: 559
I'm trying to analyze tweets that have the hashtag #contentmarketing. I first tried grabbing 20,000 tweets with tweepy but ran into the rate limit. So I'd like to take a random sample instead (or a couple random samples).
I'm not really familiar with random sampling through an API call. If I had an array that already contained the data, I would take random indices from that array without replacement. However, I don't think I can create that array in the first place without the rate limit kicking in.
Can anyone enlighten me on how to access random tweets (or random data from an API, overall)?
For reference, here's the code that got me in rate limit purgatory:
import tweepy
from tweepy import OAuthHandler
consumerKey = 'my-key'
consumerSecret = 'my-key'
accessToken = 'my-key'
accessSecret = 'my-key'
auth = OAuthHandler(consumerKey, consumerSecret)
auth.set_access_token(accessToken, accessSecret)
api = tweepy.API(auth)
tweets = []
for tweet in tweepy.Cursor(api.search, q='#contentmarketing', count=20000,
lang='en', since='2017-06-20').items():
tweets.append(tweet)
with open('content-tweets.json', 'w') as f:
json.dump(tweets, f, sort_keys=True, indent=4)
Upvotes: 0
Views: 3170
Reputation: 3148
I ever heared about getting random tweets. But you can get "forever" tweets and not all of them, so this is quite the same.
With the public search API, you can do 450 requests within 15 minutes (app auth). So you can ask for 100 tweets every 2 seconds. This is never ended.
Then change the "count" parameter to 100, and add a time.sleep(2) :
import time
for tweet in tweepy.Cursor(api.search, q='#contentmarketing', count=100, lang='en', since='2017-06-20').items():
tweets.append(tweet)
time.sleep(2)
Reference : https://developer.twitter.com/en/docs/tweets/search/api-reference/get-search-tweets.html
Upvotes: 1
Reputation: 711
This should stop the rate limit from kicking in, just make the following changes to your code:
api = tweepy.API(auth, wait_on_rate_limit=True)
Upvotes: 2