Reputation: 224
Hello I am trying to scrape the tweets of a certain user using tweepy. Here is my code :
tweets = []
username = 'example'
count = 140 #nb of tweets
try:
# Pulling individual tweets from query
for tweet in api.user_timeline(id=username, count=count, include_rts = False):
# Adding to list that contains all tweets
tweets.append((tweet.text))
except BaseException as e:
print('failed on_status,',str(e))
time.sleep(3)
The problem I am having is the tweets are coming back unfinished with "..." at the end.
I think I've looked at all the other similar problems on stack overflow and elsewhere but nothing works. Most do not concern me because I am NOT dealing with retweets .
I have tried putting tweet_mode = 'extended'
and/or tweet.full_text
or tweet._json['extended_tweet']['full_text']
in different combinations .
I don't get an error message but nothing works, just an empty list in return.
And It looks like the documentation is out of date because it says nothing about the 'tweet_mode' nor the 'include_rts' parameter :
Has anyone managed to get the full text of each tweet?? I'm really stuck on this seemingly simple problem and am losing my hair so I would appreciate any advice :D Thanks in advance!!!
Upvotes: 4
Views: 4156
Reputation: 187
As per the twitter API v2:
tweet_mode
does not work at all. You need to add expansions=referenced_tweets.id
. Then in the response, search for includes
. You can find all the truncated tweets as full tweets in the includes. You will still see the truncated tweets in response but do not worry about it.
Upvotes: 1
Reputation: 15987
TL;DR: You're most likely running into a Rate Limiting issue. And use the full_text
attribute.
Long version:
First,
The problem I am having is the tweets are coming back unfinished with "..." at the end.
From the Tweepy documentation on Extended Tweets, this is expected:
Compatibility mode
... It will also be discernible that the
text
attribute of the Status object is truncated as it will be suffixed with an ellipsis character, a space, and a shortened self-permalink URL to the Tweet.
Wrt
And It looks like the documentation is out of date because it says nothing about the 'tweet_mode' nor the 'include_rts' parameter :
They haven't explicitly added it to the documentation of each method, however, they specify that tweet_mode
is added as a param:
Standard API methods
Any
tweepy.API
method that returns a Status object accepts a newtweet_mode
parameter. Valid values for this parameter arecompat
andextended
, which give compatibility mode and extended mode, respectively. The default mode (if no parameter is provided) is compatibility mode.
So without tweet_mode
added to the call, you do get the tweets with partial text? And with it, all you get is an empty list? If you remove it and immediately retry, verify that you still get an empty list. ie, once you get an empty list result, check if you keep getting an empty list even when you change the params back to the one which worked.
Based on bug #1329 - API.user_timeline sometimes returns an empty list - it appears to be a Rate Limiting issue:
This API limitation would manifest itself as exactly the issue you're describing.
Even if it was working, it's in the full_text
attribute, not the usual text
. So the line
tweets.append((tweet.text))
should be
tweets.append(tweet.full_text)
(and you can skip the extra enclosing ()
)
Btw, if you're not interested in retweets, see this example for the correct way to handle them:
Given an existing
tweepy.API
object andid
for a Tweet, the following can be used to print the full text of the Tweet, or if it’s a Retweet, the full text of the Retweeted Tweet:status = api.get_status(id, tweet_mode="extended") try: print(status.retweeted_status.full_text) except AttributeError: # Not a Retweet print(status.full_text)
If
status
is a Retweet,status.full_text
could be truncated.
Upvotes: 7