Pol
Pol

Reputation: 4028

SQLAlchemy reports "Invalid utf8mb4 character string" for BINARY column

Assuming this MySQL table schema:

CREATE TABLE `user` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `uuid` binary(16) NOT NULL,
  `email` varchar(255) NOT NULL,
  `name` varchar(255) DEFAULT NULL,
  `photo` binary(16) DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `uuid` (`uuid`),
  UNIQUE KEY `email` (`email`)
) ENGINE=InnoDB AUTO_INCREMENT=8 DEFAULT CHARSET=utf8mb4;

When I use the execute() API from SQLAlchemy connection class as such:

with self.engine.begin() as connection:
  user_uuid = uuid.UUID("...")
  result = connection.execute("SELECT email, name, photo FROM user WHERE uuid=%s", user_uuid.bytes)

If the UUID is F393A167-A919-4B50-BBB7-4AD356E89E6B, then SQLAlchemy prints this warning:

/site-packages/sqlalchemy/engine/default.py:450: Warning: Invalid utf8mb4 character string: 'F393A1'

The uuid column is a BINARY column, so why is SQLAlchemy considering this parameter a text one instead of a binary one and how to prevent this?

Upvotes: 3

Views: 2286

Answers (2)

RazerM
RazerM

Reputation: 5492

The problem doesn't happen on Python 3, so I think that the problem is that the database driver is unable to distinguish what should be bytes given the Python 2 str type.

Regardless, it seems using SQLAlchemy core directly works correctly, presumably because it knows the column type directly.

from sqlalchemy import MetaData, Table, select

meta = MetaData()
user = Table('user', meta, autoload_with=engine)
result = select([user]).where(user.c.uuid == user_uuid.bytes)

If you wish to continue executing a string, you can cast to bytesarray like SQLAlchemy appears to be doing:

with self.engine.begin() as connection:
    user_uuid = uuid.UUID("...")
    result = connection.execute(
        "SELECT email, name, photo FROM user WHERE uuid=%s",
        bytearray(user_uuid.bytes))

Or to tell SQLAlchemy what type the bound parameter is to get this automatically:

from sqlalchemy import text, bindparam, BINARY

with self.engine.begin() as connection:
    user_uuid = uuid.UUID("...")
    stmt = text("SELECT email, name, photo FROM user WHERE uuid = :uuid")
    stmt = stmt.bindparams(bindparam('uuid', user_uuid.bytes, type_=BINARY))
    result = connection.execute(stmt)

Upvotes: 1

Pol
Pol

Reputation: 4028

The explanation and solution is actually in this bug report in MySQL:

replace:

cursor.execute(""" INSERT INTO user (uuid) VALUES (%s) """, my_uuid)

with

cursor.execute(""" INSERT INTO user (uuid) VALUES (_binary %s) """, my_uuid)

Mind the underscore. It's "_binary", not "binary". This "_binary" tells MySQL that the following string is to be interpreted as binary, not to be interpreted/validated as utf8.

Upvotes: 6

Related Questions