Hayk
Hayk

Reputation: 687

detect with Python if the string will lead to "Incorrect string value" error in MySQL

I have a table in MySQL (5.7) database, which has collation utf8_unicode_ci, and where I'm inserting some data with Python (3.6).

With some of the strings (for example, '\xCE\xA6') I get "Incorrect string value" error. On the DB side, I can mute this error by turning off the strict mode in MySQL, or changing the field's collation to utf8mb4.

However, such strings are "anomalies", and it is not desirable to change a collation or the sql_mode.

How can I detect in Python 3, that a given string will lead to "incorrect string value" error with MySQL, before inserting into a Table ?

Upvotes: 1

Views: 527

Answers (1)

Rick James
Rick James

Reputation: 142366

Where do you get the error message? What operation is being performed?

C3A6 is the UTF-8 (cf MySQL's utf8 or utf8mb4) hex for æ; does it seem likely that that was the desired character?

To handle utf8 (or utf8mb4), you need to determine what the client's encoding. Sounds like UTF-8. So, when connecting to MySQL, tell it that -- use these in the connect call:

charset="utf8", use_unicode=True

If the character is in the python source, you need

# -*- coding: utf-8 -*-

at the beginning of the source.

Also the column you are inserting into needs to be CHARACTER SET utf8 (or utf8mb4).

utf8mb4 is needed for Emoji and some of Chinese; otherwise it is 'equivalent' to utf8.

Do not use decode() or any other conversion functions; that will just make things harder to fix. In this arena, two wrongs does not make a right; it makes a worse mess.

If you have other symptoms of garbled characters, see Trouble with UTF-8 characters; what I see is not what I stored

To discuss further, please provide the connection call, the SQL statement involved, SHOW CREATE TABLE, and anything else involved.

C3A6 is a valid utf8/utf8mb4 character æ, and could be interpreted as valid, though unlikely, latin1 æ. But it is invalid for CHARACTER SET ascii. (I don't know how the error message occurred unless the connection said ascii or some obscure charset.)

Upvotes: 2

Related Questions