Reputation: 1
Hello,
Using Python 2.7 & Tweepy Library
Main topic: Downloading tweets from Streaming API using Python.
I am confused about different formats of downloaded tweets from Streaming API, as formatting differentiate from one to another of the same tweet !!
Note: I am concerning only in Arabic Tweets.
1st format is:
{"created_at":"Wed Feb 03 12:52:53 +0000 2016","id":694866144142848001,"id_str":"694866144142848001","text":"\u06 ………
Used Code of 1st format:
import tweepy
import json
consumer_key = ''
consumer_secret = ''
access_token = ''
access_token_secret = ''
class StdOutListener(StreamListener):
def on_data(self, data):
print(data)
file.write(data)
if __name__ == '__main__':
#OAuth process, using the keys and tokens
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
listener = StdOutListener()
stream = Stream(auth, listener)
stream.filter( track=[u'الى' , u'إلى' ,u'عشان',u'علشان',u'ماشى',u'ليه',u'ازاى'])
============================================
2nd format is:
{u'contributors': None, u'truncated': False, u'text': u'\u0627', u'is_quote_status': False,….
Used Code of 2nd:
def on_data(self, data):
print json.loads(data)
Note: Error when writing json.loads(data) in a file
=================================================
3rd format is:
{"contributors": null, "truncated": false, "text": "RT @a_meles: @EHSANFAKEEH\n\u0627 ", "is_quote_status": false, "in_reply_to_status_id": null, "id": 695174171903582208, "favorite_count": 0, "source": "http://twitter.com/download/android\" rel=\"nofollow\">Twitter f……
Used Code of 3rd format:
def on_data(self, data):
x = json.loads(data)
print (json.dumps(x))
================================================
4th format is:
Status(contributors=None, truncated=False, text=u'@AlsaeedFajer \u0627\ ', is_quote_status=False, in_reply_to_status_id=None, id=694494200520413184L, favorite_count=0, _api=, author=User(follow_request_sent=None, profile_use_background_image=True, _json={u'follow_reques……….
Used code of 4th format: used on_status instead of on_data
def on_status(self, status):
print status
============================================
Then, which is the familiar way to extract tweet text and write in a file without problems??
Thanks for your efforts,
Upvotes: 0
Views: 329
Reputation: 1
An Answer for the same question
Ideally, you want to work with & store JSON objects as they are sent to you from Twitter:
1st format: on_data method of a stream listener receives all messages: see https://github.com/tweepy/tweepy/blob/master/docs/streaming_how_to.rst1 This may or may not be what you want to store in a file for later.
2nd format: print json.loads(data) prints the string representation of the Python object you create with .loads (it's usually a dict)
3rd format: result is almost the same as 1st - but you are Deserializing a json object from a string, then Serializing it straight away. Difference between 1st and 3rd: the order of fields like "created_at", "id" sometimes changes depending on the library.
4th format is a string representation of a Status Python object: tweepy Status object is not JSON, but it has a _json property which contains the JSON response from twitter.
It should work if you use on_status() listener, storing the status._json in a file for later (not the status object itself)
Hope that helps!
This is not my answer, I just added the link to spread the knowledge. Thanks for him.
Upvotes: 0