Reputation: 445
I was studying the code to access Twitter data. The code was written by someone whos shows how to access the twitter data in youtube.
Please see code below (some parts are truncated):
from tweepy import API
from tweepy import Cursor
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
import twitter_credentials
import numpy as np
import pandas as pd
class TwitterClient():
def __init__(self, twitter_user=None):
self.auth = TwitterAuthenticator().authenticate_twitter_app()
self.twitter_client = API(self.auth)
self.twitter_user = twitter_user
def get_user_timeline_tweets(self, num_tweets):
tweets = []
for tweet in Cursor(self.twitter_client.user_timeline, id=self.twitter_user).items(num_tweets):
tweets.append(tweet)
return tweets
class TwitterAuthenticator():
def authenticate_twitter_app(self):
auth = xxxx
return auth
class TwitterStreamer():
"""
Class for streaming and processing live tweets.
"""
def __init__(self):
self.twitter_autenticator = TwitterAuthenticator()
def stream_tweets(self, fetched_tweets_filename, hash_tag_list):
# This handles Twitter authetification and the connection to Twitter Streaming API
listener = TwitterListener(fetched_tweets_filename)
auth = self.twitter_autenticator.authenticate_twitter_app()
stream = Stream(auth, listener)
# This line filter Twitter Streams to capture data by the keywords:
stream.filter(track=hash_tag_list)
class TwitterListener(StreamListener):
xxxxxxx
if __name__ == '__main__':
hash_tag_list = ["donal trump", "hillary clinton", "barack obama", "bernie sanders"]
twitter_client = TwitterClient('COVID19')
print(twitter_client.get_user_timeline_tweets(1))
twitter_streamer=TwitterStreamer()
twitter_streamer.stream_tweets(
fetched_tweets_filename, hash_tag_list)
From the code, I was wondering why create two classes TwitterClient(
) and TwitterStreamer()
? TwitterStreamer()
works with hashtag
list while TwitterClient()
is user specific
. Does it mean that TwitterStreamer()
is more like a large scale search while TwitterClient()
is more user specific. Why separating them in two? and why using TwitterStreamer()
class for the hashtags
only ?
Can someone please comment on this bit of code as i am new to twitter data exploration.
Thanks so much
Upvotes: 0
Views: 261
Reputation: 332
The short answer here is that these two different classes are accessing different endpoints - aka, they are pulling different data.
The magic of TwitterClient()
happens in the get_user_timeline
portion. This is where the data gathering happens. This method accesses the user_timeline
endpoint which iteratively pulls up to 3,200 tweets (starting from that user's most recent tweet, going backward in time).
The TwitterStreamer()
class has the stream_tweets
method which accesses Twitter's Filter endpoint with the line stream.filter(track=hash_tag_list)
. This only pulls tweets that include the hashtags you pass it (or a number of other filters - e.g. words, phrases, users) and it does so in real-time. That is, you get tweets as they are sent as opposed to going back in time to get what has already been posted.
Upvotes: 1