Reputation: 4797
I have written a code in Python 3.5, where I was using Tweepy & SQLAlchemy & the following lines to load Tweets into a database and it worked well:
twitter = Twitter(str(tweet.user.name).encode('utf8'), str(tweet.text).encode('utf8'))
session.add(twitter)
session.commit()
Using the same code now in Python 2.7 raises an Error:
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2026' in position 139: ordinal not in range(128)
Whats the solution? My MySQL Configuration is the following one:
Server side --> utf8mb4 encoding
Client side --> create_engine('mysql+pymysql://abc:def@abc/def', encoding='utf8', convert_unicode=True)
):
UPDATE
It seems that there is no solution, at least not with Python 2.7 + SQLAlchemy. Here is what I found out so far and if I am wrong, please correct me.
Tweepy, at least in Python 2.7, returns unicode type objects.
In Python 2.7: tweet = u'☠'
is a <'unicode' type>
In Python 3.5: tweet = u'☠'
is a <'str' class>
This means Python 2.7 will give me an 'UnicodeEncodeError' if I do str(tweet)
because Python 2.7 then tries to encode this character '☠' into ASCII, which is not possible, because ASCII can only handle this basic characters.
Conclusion:
Using just this statement tweet.user.name
in the SQLAlchemy line gives me the following error:
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-4: ordinal not in range(256)
Using either this statement tweet.user.name.encode('utf-8')
or this one str(tweet.user.name.encode('utf-8'))
in the SQLAlchemy line should actually work the right way, but it shows me unencoded characters on the database side:
ð´ââ ï¸Jack Sparrow
This is what I want it to show:
Printed: 🏴☠️ Jack Sparrow
Special characters unicode: u'\U0001f3f4\u200d\u2620\ufe0f'
Special characters UTF-8 encoding: '\xf0\x9f\x8f\xb4\xe2\x80\x8d\xe2\x98\xa0\xef\xb8\x8f'
Upvotes: 0
Views: 1199
Reputation: 142528
Do not use any encode/decode functions; they only compound the problems.
Do set the connection to be UTF-8.
Do set the column/table to utf8mb4 instead of utf8.
Do use # -*- coding: utf-8 -*-
at the beginning of Python code.
More Python tips Note that that has a link to "Python 2.7 issues; improvements in Python 3".
Upvotes: 0