Reputation: 687
I have a table in MySQL (5.7) database, which has collation utf8_unicode_ci, and where I'm inserting some data with Python (3.6).
With some of the strings (for example, '\xCE\xA6') I get "Incorrect string value" error. On the DB side, I can mute this error by turning off the strict mode in MySQL, or changing the field's collation to utf8mb4.
However, such strings are "anomalies", and it is not desirable to change a collation or the sql_mode.
How can I detect in Python 3, that a given string will lead to "incorrect string value" error with MySQL, before inserting into a Table ?
Upvotes: 1
Views: 527
Reputation: 142366
Where do you get the error message? What operation is being performed?
C3A6
is the UTF-8 (cf MySQL's utf8 or utf8mb4) hex for æ
; does it seem likely that that was the desired character?
To handle utf8 (or utf8mb4), you need to determine what the client's encoding. Sounds like UTF-8. So, when connecting to MySQL, tell it that -- use these in the connect call:
charset="utf8", use_unicode=True
If the character is in the python source, you need
# -*- coding: utf-8 -*-
at the beginning of the source.
Also the column you are inserting into needs to be CHARACTER SET utf8
(or utf8mb4).
utf8mb4
is needed for Emoji and some of Chinese; otherwise it is 'equivalent' to utf8
.
Do not use decode()
or any other conversion functions; that will just make things harder to fix. In this arena, two wrongs does not make a right; it makes a worse mess.
If you have other symptoms of garbled characters, see Trouble with UTF-8 characters; what I see is not what I stored
To discuss further, please provide the connection call, the SQL statement involved, SHOW CREATE TABLE
, and anything else involved.
C3A6
is a valid utf8/utf8mb4 character æ
, and could be interpreted as valid, though unlikely, latin1 æ
. But it is invalid for CHARACTER SET ascii
. (I don't know how the error message occurred unless the connection said ascii or some obscure charset.)
Upvotes: 2