bravopapa
bravopapa

Reputation: 445

Accessing Twitter Data: API/Cursor (twitter user) vs Streaming (hash_tag_list)

I was studying the code to access Twitter data. The code was written by someone whos shows how to access the twitter data in youtube.

Please see code below (some parts are truncated):

from tweepy import API 
from tweepy import Cursor
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
 
import twitter_credentials
import numpy as np
import pandas as pd

class TwitterClient():
    def __init__(self, twitter_user=None):
        self.auth = TwitterAuthenticator().authenticate_twitter_app()
        self.twitter_client = API(self.auth)
        self.twitter_user = twitter_user
    def get_user_timeline_tweets(self, num_tweets):
        tweets = []
        for tweet in Cursor(self.twitter_client.user_timeline, id=self.twitter_user).items(num_tweets):
            tweets.append(tweet)
        return tweets        

class TwitterAuthenticator():
    def authenticate_twitter_app(self):
        auth = xxxx
        return auth

class TwitterStreamer():
    """
    Class for streaming and processing live tweets.
    """
    def __init__(self):
        self.twitter_autenticator = TwitterAuthenticator()    
    def stream_tweets(self, fetched_tweets_filename, hash_tag_list):
        # This handles Twitter authetification and the connection to Twitter Streaming API
        listener = TwitterListener(fetched_tweets_filename)
        auth = self.twitter_autenticator.authenticate_twitter_app() 
        stream = Stream(auth, listener)

        # This line filter Twitter Streams to capture data by the keywords: 
        stream.filter(track=hash_tag_list)


class TwitterListener(StreamListener):
     xxxxxxx

if __name__ == '__main__':

    hash_tag_list = ["donal trump", "hillary clinton", "barack obama", "bernie sanders"]
   
    twitter_client = TwitterClient('COVID19')
    print(twitter_client.get_user_timeline_tweets(1))
    twitter_streamer=TwitterStreamer()    
    twitter_streamer.stream_tweets(      
    fetched_tweets_filename, hash_tag_list)

From the code, I was wondering why create two classes TwitterClient() and TwitterStreamer()? TwitterStreamer() works with hashtag list while TwitterClient() is user specific. Does it mean that TwitterStreamer() is more like a large scale search while TwitterClient() is more user specific. Why separating them in two? and why using TwitterStreamer() class for the hashtags only ?

Can someone please comment on this bit of code as i am new to twitter data exploration.

Thanks so much

Upvotes: 0

Views: 261

Answers (1)

mdeverna
mdeverna

Reputation: 332

The short answer here is that these two different classes are accessing different endpoints - aka, they are pulling different data.

The magic of TwitterClient() happens in the get_user_timeline portion. This is where the data gathering happens. This method accesses the user_timeline endpoint which iteratively pulls up to 3,200 tweets (starting from that user's most recent tweet, going backward in time).

The TwitterStreamer() class has the stream_tweets method which accesses Twitter's Filter endpoint with the line stream.filter(track=hash_tag_list). This only pulls tweets that include the hashtags you pass it (or a number of other filters - e.g. words, phrases, users) and it does so in real-time. That is, you get tweets as they are sent as opposed to going back in time to get what has already been posted.

Upvotes: 1

Related Questions