Reputation: 31
I'm using the streaming API to follow a specific user id, and I'm able to stream without any issues. However, when I compare all streamed tweets collected in one day to the ones collected with the rest API it seems that the stream API missed some retweets, i.e. tweets from the user id that somebody else retweeted.
I would've expected tweets from the rest API to be missing due to deleted content, but I can't understand why there would be missing tweets from the streaming.
I checked and I'm not hitting the rate limit (all tweets collected throughout the day are less than 200), the connection wasn't interrupted, I tried different days, and it is always around 25% missing retweets. No other types of tweets are missing.
Any help is much appreciated!!
class StreamListener(tweepy.StreamListener):
def __init__(self, output_file=sys.stdout):
super(StreamListener,self).__init__()
def on_status(self, status):
with open('tweets.json', 'a') as tf:
json.dump(status._json, tf)
tf.write('\n')
print(status.text)
def on_error(self, status_code):
if status_code == 420:
return False
stream_listener = StreamListener()
stream = tweepy.Stream(auth=api.auth, listener=stream_listener)
stream.filter(follow=<id>)
Upvotes: 3
Views: 192