KEVON
KEVON

Reputation: 51

Tweepy public stream filter by a changing variable

I was working with the Tweepy library for python to access the public twitter stream and ran into a problem where once the stream is running, it doesn't stop. Now, that makes sense for what it does, but I wanted it to start filtering with an empty list of user IDs and after a while, userIDs are added to the list after someone tweets a particular track word, so that once they tweet a word, the tracker starts tracking all their tweets. The problem is that once the stream is started with the initial filter options, changing the variables doesn't affect the filter; it just keeps on using the initial arguments.

userIDs = []

trackWords = ["#Obama"]

def stream():

    s = Stream(auth, StreamListener())

    s.filter(follow = userIDs, track = trackWords)

I was able to get around this earlier by recalling the stream definition again after a new keyword is added, but I have multiple streams searching and I put them in separate threads so they can all run simultaneously. I can't figure out how to refresh the threads, so trying to refresh the filter without recalling the definition seems easier.

I'm fairly new to programming, so maybe this is a fundamental concept I don't know yet, but hopefully there's an easy trick to get it to refresh.

Here's all my relevant code if that helps anyone. The above was just a quick thing to help show what I'm talking about:

userIDs = []
userNames = []

account = ['@DMS_423']

publicKeyWords = ['the','be','to','of','and','are','is','were','was']

class AStreamListener(StreamListener):
    def on_status(self, status):
        if status.author.screen_name not in userNames:
            userNames.append(str(status.author.screen_name))
            userIDs.append(str(api.get_user(str(status.author.screen_name)).id))
            print status.author.screen_name, "has joined the game."

def uStream():
    s = Stream(auth, StreamListener())
    s.filter(follow = userIDs)

def pStream():
    ps = PStream(pAuth, PStreamListener())
    ps.filter(track = publicKeyWords)

def aStream():
    adds = Stream(auth, AStreamListener())
    adds.filter(track = account)

t1 = Thread(target = aStream)
t2 = Thread(target = uStream)
t3 = Thread(target = pStream)

def run():
    t1.start()
    t2.start()
    t3.start()

run()

Upvotes: 5

Views: 3838

Answers (2)

If you want to stop the tweet at a certain condition/requirement on number of tweets, edit self.num_tweets = 0 and keep a count of it. You can use it as a limiter, in def on_status

Upvotes: 2

Dwight Gunning
Dwight Gunning

Reputation: 2525

The Tweepy python library API doesn't support the behavior you're looking for. There's no way to modify the parameters associated with the stream that's being subscribed to.

In fact, the Twitter API itself doesn't support changing parameters mid-stream. They go as far as cautioning against it. That's not to say that it wouldn't be possible to make it work (just be wary and avoid exceeding the rate limits).

I'd adjust your approach to initialise a 2nd stream with the new query parameters, use the tweet ids to avoid passing/persisting the same tweets twice, and then once the 2nd stream is established you'd close the initial stream.

Upvotes: 2

Related Questions