Random sampling tweets with tweepy

Question

I'm trying to analyze tweets that have the hashtag #contentmarketing. I first tried grabbing 20,000 tweets with tweepy but ran into the rate limit. So I'd like to take a random sample instead (or a couple random samples).

I'm not really familiar with random sampling through an API call. If I had an array that already contained the data, I would take random indices from that array without replacement. However, I don't think I can create that array in the first place without the rate limit kicking in.

Can anyone enlighten me on how to access random tweets (or random data from an API, overall)?

For reference, here's the code that got me in rate limit purgatory:

import tweepy
from tweepy import OAuthHandler

consumerKey = 'my-key'
consumerSecret = 'my-key'
accessToken = 'my-key'
accessSecret = 'my-key'

auth = OAuthHandler(consumerKey, consumerSecret)
auth.set_access_token(accessToken, accessSecret)

api = tweepy.API(auth)

tweets = []

for tweet in tweepy.Cursor(api.search, q='#contentmarketing', count=20000, 
    lang='en', since='2017-06-20').items():
        tweets.append(tweet)

with open('content-tweets.json', 'w') as f:
    json.dump(tweets, f, sort_keys=True, indent=4)

Sssssuppp · Accepted Answer

This should stop the rate limit from kicking in, just make the following changes to your code:

api = tweepy.API(auth, wait_on_rate_limit=True)

Random sampling tweets with tweepy

Answers (2)

Related Questions