Reputation: 21
I'm using Tweepy streamer to collect some tweets for certain tags, in this case #python. The streamer part of the script works fine but where I'm struggling is extracting the information from the output.
Tweepy sample: {"created_at":"Fri Aug 05 17:27:00 +0000 2016","id":761614496666361857,"id_str":"761614496666361857","text":"Use different Python version with virtualenv #py thon #virtualenv #virtualenvwrapper https://t.co/ecedKrCX0L","source":
From the sample, I want to extract and print the bold text however I can't seem to get this to work properly. So far I've come up with:
class MyListener(StreamListener):
def on_data(self, data):
try:
pattern = re.compile(r'"text":"(.*?)","')
for line in data:
x = pattern.search(data)
f = open('tmp', 'a')
f.write(data)
f.close
return True
else:
pass
except BaseException as e:
print("Error on_data: %s" % str(e))
return True
However this doesn't extract the specifics I'm after and continues to print a full tweepy output.
Any assistance would be appreciated!
Thanks
Upvotes: 0
Views: 459
Reputation: 2014
The easisest way if you want to extract the text is with json module.
import json
class MyListener(StreamListener):
def on_data(self, data):
try:
json.loads(data)
f = open('tmp', 'a')
f.write(data["text"])
f.close()
except BaseException as e:
print("Error on_data: %s" % str(e))
return True
But if you want to use a regular expression this will be the code:
class MyListener(StreamListener):
def on_data(self, data):
try:
pattern = re.compile(r'"text":"([^",]*)","')
for line in data:
x = pattern.search(data)
f = open('tmp', 'a')
f.write(data)
f.close
return True
else:
pass
except BaseException as e:
print("Error on_data: %s" % str(e))
return True
Upvotes: 1