Nahid O.
Nahid O.

Reputation: 301

Filtering Twitter data using Tweepy

I've used Marco Bonzanini's tutorial on mining Twitter data : https://marcobonzanini.com/2015/03/02/mining-twitter-data-with-python-part-1/

class MyListener(StreamListener):

    def on_data(self, data):
        try:
            with open('python.json', 'a') as f:
                f.write(data)
                return True
        except BaseException as e:
            print("Error on_data: %s" % str(e))
        return True

    def on_error(self, status):
        print(status)
        return True

and used the "follow" parameter of the filter method to retrieve the tweets produced by this specific ID :

twitter_stream = Stream(auth, MyListener())
twitter_stream.filter(follow=["63728193"#random Twitter ID])

However, it does not seem to fulfill the mission since it not only returns the tweets & retweets created by the ID, but also every tweet wherein the ID is mentioned (i.e. retweets). That is not what I want.

I'm sure there must be a way to do it since there is a "screen_name" field in the json file given by Twitter. That screen_name field gives the name of the creator of the Tweet. I just have to find how to filter the data on this screen_neame field.

Upvotes: 0

Views: 2213

Answers (1)

asongtoruin
asongtoruin

Reputation: 10359

This behaviour is by design. To quote the Twitter streaming API docs:

For each user specified, the stream will contain:

  • Tweets created by the user.
  • Tweets which are retweeted by the user.
  • Replies to any Tweet created by the user.
  • Retweets of any Tweet created by the user.
  • Manual replies, created without pressing a reply button (e.g. “@twitterapi I agree”).

The best way for you to process it for your purposes is to check who created the tweet as it is received, which I believe can be done as follows:

class MyListener(StreamListener):
    def on_data(self, data):
        try:
            if data._json['user']['id'] == "63728193":
                with open('python.json', 'a') as f:
                    f.write(data)
        except BaseException as e:
            print("Error on_data: %s" % str(e))
        return True

    def on_error(self, status):
        print(status)
        return True

Upvotes: 2

Related Questions