user1634278
user1634278

Reputation: 21

Python - Tweepy Regex

I'm using Tweepy streamer to collect some tweets for certain tags, in this case #python. The streamer part of the script works fine but where I'm struggling is extracting the information from the output.

Tweepy sample: {"created_at":"Fri Aug 05 17:27:00 +0000 2016","id":761614496666361857,"id_str":"761614496666361857","text":"Use different Python version with virtualenv #py thon #virtualenv #virtualenvwrapper https://t.co/ecedKrCX0L","source":

From the sample, I want to extract and print the bold text however I can't seem to get this to work properly. So far I've come up with:

class MyListener(StreamListener):
        def on_data(self, data):
                try:
                        pattern = re.compile(r'"text":"(.*?)","')
                        for line in data:
                                        x = pattern.search(data)
                                        f = open('tmp', 'a')
                                        f.write(data)
                                        f.close
                                        return True
                        else:
                                pass

                except BaseException as e:
                        print("Error on_data: %s" % str(e))
                        return True

However this doesn't extract the specifics I'm after and continues to print a full tweepy output.

Any assistance would be appreciated!

Thanks

Upvotes: 0

Views: 459

Answers (1)

Tim Givois
Tim Givois

Reputation: 2014

The easisest way if you want to extract the text is with json module.

import json
class MyListener(StreamListener):

    def on_data(self, data):
            try:
                    json.loads(data)
                    f = open('tmp', 'a')
                    f.write(data["text"])
                    f.close()
            except BaseException as e:
                    print("Error on_data: %s" % str(e))
                    return True

But if you want to use a regular expression this will be the code:

class MyListener(StreamListener):
    def on_data(self, data):
            try:
                    pattern = re.compile(r'"text":"([^",]*)","')
                    for line in data:
                                    x = pattern.search(data)
                                    f = open('tmp', 'a')
                                    f.write(data)
                                    f.close
                                    return True
                    else:
                            pass

            except BaseException as e:
                    print("Error on_data: %s" % str(e))
                    return True

Upvotes: 1

Related Questions