Reputation: 2656
Good morning everyone,
I'm having a rave with my twitter bot - I need to dump the streamed tweets (which arrive in json) to a file.
I previously have done this by writing it as utf8 formatted strings, however it now turns out that I still need to filter some data, so storing it away as json in the file seemed like the easiest way to go.
I edited the code accordingly:
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
import datetime
import json
access_token = #####
access_token_secret = #####
consumer_key = #####
consumer_secret = #####
class StdOutListener(StreamListener):
def on_status(self, status):
print(status)
today = datetime.datetime.now()
with open('/git/twttrbots/data/Twitter_Raw %s' %
today.strftime("%a-%Y-%m-%d"), 'a') as f:
json.dump(status, f) # <- doesn't work
#f.write(json.dumps(status)) # <- doesn't work
#f.write("Blah") # <- works perfectly fine
if __name__ == '__main__':
while True:
try:
#login using auth
l = StdOutListener()
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth, l)
#filter by hashtag
stream.filter(track=['bitcoin', 'cryptocurrency', 'wonderlandcoin',
'btc', 'fintech', 'satoshi', 'blockchain',
'litecoin', 'btce'])
except:
print("Whoops, dicsonnected at %s. Retrying"
% datetime.datetime.now())
continue
The file is created, the status definitely is read (there's print output in my terminal) but somewhere along the way my data is blasted out into nirvana, instead of my file - as that remains empty at 0
bytes.
I found similar cases here and on other platforms, however, they used json.dumps()
instead of json.dump()
- albeit I have tried both functions as well (using f.write(dumps(status))
), but none of them seem to work.
Now, I'm not a complete fool; I am well aware that it's probably on my end - not a JSON
error - but I can't figure out what it is i am doing wrong.
The only thing I was able to do, is boil it down to an error that occurs in my with open()
statement, leading me to believe it's something about either the open()
mode, or the way I write my data to the file. I know this, since the above linked question's answer works fine on my machine.
I could, of course, use the subprocess module and call a pipe that dumps the print(status) to a file, but that can't be the solution to this?
Addendum
As requested, here's my console output.
Here's what the logger caught when I called logger.debug('status dump: %s', json.dumps(status))
.
Upvotes: 2
Views: 2242
Reputation: 160457
An initial observation I have to make (because it got frustrating when I tried to run this script):
Don't make your except clauses too broad, you catch everything (Including KeyboardInterrupt
) and make it hard to stop execution.
This is optionary, but it's good to add the appropriate interrupt:
except KeyboardInterrupt:
exit()
The second thing you are doing which is just making your life a bit harder is that not only are you catching everything using the bare except
; you are not printing the corresponding error. Adding this will catch the culprit in this case, point you to the right direction and make your life so much easier.
except Exception as e:
print("Error: ", e)
print("Whoops, dicsonnected at %s. Retrying"
% datetime.datetime.now())
This outputs a (rather disgusting) message which essentially prints out the Status
object and ends with a line informing you that this object:
is not JSON serializable
This is somewhat logical since what we are dealing here isn't a json
object but, instead, a Status
object returned from tweepy.Stream
.
I have no idea why exactly the creator(s) of tweepy
have done this, a believe there's solid reasons behind it, but to solve your issue you can simply access the underlying .json
object:
json.dump(status._json, f)
Now, you should be good to go.
This seems to be an internal tweepy
issue relating to the transition from Python 2
to Python 3.x
. Specifically, in file streaming.py
:
File "/home/jim/anaconda/envs/Python3/lib/python3.5/site-packages/tweepy/streaming.py", line 171, in read_line
self._buffer += self._stream.read(self._chunk_size) <--
TypeError: Can't convert 'bytes' object to str implicitly
There has been a solution proposed (and according to the replies, working) on the tweepy
GitHub repository by user cozos suggesting:
In streaming.py:
I changed line
161
to:
self._buffer += self._stream.read(read_len).decode('ascii')
and line
171
to:
self._buffer += self._stream.read(self._chunk_size).decode('ascii')
and then reinstalled.
Even though I'm not sure what he means by 'reinstalled'.
Use tweepy
with Python 2.7.10
. It works like a charm.
Upvotes: 1