Reputation: 728
How do I encode something in ut8mb4 in Python?
I have two sets of data: data I am migrating to my new MySQL database over from Parse, and data going forward (that talks only to my new database). My database is utf8mb4 in order to store emoji and accented letters.
The first set of data only shows up correctly (when emoji and accents are involved) when I have in my python script:
MySQLdb.escape_string(unicode(xstr(data.get('message'))).encode('utf-8'))
and when reading from the MySQL database in PHP:
$row["message"] = utf8_encode($row["message"]);
The second set of data only shows up correctly (when emoji and accents are involved) when I DON'T include the utf8_encode($row["message"])
portion. I am trying to reconcile these so that both sets of data are returned correctly to my iOS app. Please help!
Upvotes: 13
Views: 32527
Reputation: 1123420
MySQL's utf8mb4
encoding is just standard UTF-8.
They had to add that name however to distinguish it from the broken UTF-8 character set which only supported BMP characters.
In other words, from the Python side you should always encode to UTF-8 when talking to MySQL, but take into account that the database may not be able to handle Unicode codepoints beyond U+FFFF, unless you use utf8mb4
on the MySQL side.
However, generally speaking, you want to avoid manually encoding and decoding, and instead leave it to MySQLdb
worry about this. You do this by configuring your connection and your collations to handle Unicode text transparently. For MySQLdb
, that means setting charset='utf8mb4'
:
database = MySQLdb.connect(
host=hostname,
user=username,
passwd=password,
db=databasename,
charset="utf8mb4"
)
Then use normal Python 3 str
strings; leave the use_unicode
option set to it's default True
*.
Note: this handles SET NAMES
and SET character_set_connection
) for you, there is no need to issue those manually.
* Unless you still use Python 2, then the default is False
. Set it to True
and use u'...'
unicode strings.
Upvotes: 27
Reputation: 578
You can also enter the type of code that you want in the following way
mysql.connector.connect(host = '<host>', database = '<db>', user = '<user>', password = '<password>', charset = 'utf8')
The fields inside '<>' are your own details. Instead of 'utf8' you can also write 'utf8mb4' depending on the type of coding your mysqldb wants.
Upvotes: 2
Reputation: 974
use_unicode=True
didn't work for me.
My solution
MySQLdb.connect(host='###' [...], charset='utf8'
dbCursor.execute('SET NAMES utf8mb4')
dbCursor.execute("SET CHARACTER SET utf8mb4")
Upvotes: 2
Reputation: 2611
I have struggled myself with the correct exchange of the full range of UTF-8 characters between Python and MySQL for the sake of Emoji and other characters beyond the U+FFFF codepoint.
To be sure that everything worked fine, I had to do the following:
utf8mb4
was used for CHAR
, VARCHAR
, and TEXT
columns in MySQLTo enforce UTF-8 in Python, add the following line as first or second line of your Python script:
# -*- coding: utf-8 -*-
To enforce UTF-8 between Python and MySQL, setup the MySQL connection as follows:
# Connect to mysql.
dbc = MySQLdb.connect(host='###', user='###', passwd='###', db='###', use_unicode=True)
# Create a cursor.
cursor = dbc.cursor()
# Enforce UTF-8 for the connection.
cursor.execute('SET NAMES utf8mb4')
cursor.execute("SET CHARACTER SET utf8mb4")
cursor.execute("SET character_set_connection=utf8mb4")
# Do database stuff.
# Commit data.
dbc.commit()
# Close cursor and connection.
cursor.close()
dbc.close()
This way, you don't need to use functions such as encode
and utf8_encode
.
Upvotes: 28