Reputation:
I'm new to Python and I've been trying to fix it for two hours now.
Here's the code:
import praw
import json
import requests
import tweepy
import time
access_token = 'REDACTED'
access_token_secret = 'REDACTED'
consumer_key = 'REDACTED'
consumer_secret = 'REDACTED'
def strip_title(title):
if len(title) < 94:
return title
else:
return title[:93] + "..."
def tweet_creator(subreddit_info):
post_dict = {}
post_ids = []
print "[bot] Getting posts from Reddit"
for submission in subreddit_info.get_hot(limit=20):
post_dict[strip_title(submission.title)] = submission.url
post_ids.append(submission.id)
print "[bot] Generating short link using goo.gl"
mini_post_dict = {}
for post in post_dict:
post_title = post
post_link = post_dict[post]
short_link = shorten(post_link)
mini_post_dict[post_title] = short_link
return mini_post_dict, post_ids
def setup_connection_reddit(subreddit):
print "[bot] setting up connection with Reddit"
r = praw.Reddit('yasoob_python reddit twitter bot '
'monitoring %s' %(subreddit))
subreddit = r.get_subreddit(subreddit)
return subreddit
def shorten(url):
headers = {'content-type': 'application/json'}
payload = {"longUrl": url}
url = "https://www.googleapis.com/urlshortener/v1/url"
r = requests.post(url, data=json.dumps(payload), headers=headers)
link = json.loads(r.text)['id']
return link
def duplicate_check(id):
found = 0
with open('posted_posts.txt', 'r') as file:
for line in file:
if id in line:
found = 1
return found
def add_id_to_file(id):
with open('posted_posts.txt', 'a') as file:
file.write(str(id) + "\n")
def main():
subreddit = setup_connection_reddit(‘python’)
post_dict, post_ids = tweet_creator(subreddit)
tweeter(post_dict, post_ids)
def tweeter(post_dict, post_ids):
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
for post, post_id in zip(post_dict, post_ids):
found = duplicate_check(post_id)
if found == 0:
print "[bot] Posting this link on twitter"
print post+" "+post_dict[post]+" #python"
api.update_status(post+" "+post_dict[post]+" #python")
add_id_to_file(post_id)
time.sleep(30)
else:
print "[bot] Already posted"
if __name__ == '__main__':
main()
Traceback:
root@li732-134:~# python twitter.py
[bot] setting up connection with Reddit
[bot] Getting posts from Reddit
[bot] Generating short link using goo.gl
[bot] Already posted
[bot] Already posted
[bot] Already posted
[bot] Posting this link on twitter
Traceback (most recent call last):
File "twitter.py", line 82, in <module>
main()
File "twitter.py", line 64, in main
tweeter(post_dict, post_ids)
File "twitter.py", line 74, in tweeter
print post+" "+post_dict[post]+" #python"
UnicodeEncodeError: 'ascii' codec can't encode character u'\xea' in position 39:
ordinal not in range(128)`
I really have no idea what to do. Could someone point me in the right direction?
Edit: Added code and traceback.
Upvotes: 1
Views: 7079
Reputation: 391
Even if you call decode()
, the bytes you're receiving have to be in an expected, properly encoded form.
If \xea
is encountered in a UTF-8 string, it must be followed by two bytes, and not just any bytes, they have to be in the valid range. Otherwise, it's not valid UTF-8.
E.g. here are two Unicode code points. The first code point U+56
takes only a single byte. The next one, U+a000
requires three bytes, and the way we know that is because we encounter \xea
:
http://hexutf8.com/?q=0x560xea0x800x80
Simply remove the last of the continuation bytes in the above, and this ceases to be valid UTF-8:
http://hexutf8.com/?q=0x560xea0x80
I don't see where you've posted the value you're failing on, but I'd double-check that and make sure you're actually getting valid UTF-8 data.
Upvotes: 1
Reputation: 122296
The error happens here:
print post+" "+post_dict[post]+" #python"
The problem seems to be that you're concatenating ASCII strings and Unicode strings in this line. That's causing a problem here. Try concatenating only Unicode strings:
print post + u" " + post_dict[post] + u" #python"
If you're still having problems, look at the output of type(post)
and type(post_dict[post])
which should both be Unicode strings. If either of them isn't then you'll need to convert them to be a Unicode string using the correct encoding (most likely UTF-8). That can be done as follows:
post.decode('UTF-8')
or:
post_dict[post].decode('UTF-8')
The above would convert a string to a Unicode string in Python 2. Once you've done that you can safely concatenate the Unicode strings together. The key thing in Python 2 is to never mix regular strings with Unicode strings as that'll cause problems.
Upvotes: 0