runawaykid
runawaykid

Reputation: 1441

Uploading Python Pandas dataframe to MySQL - InternalError: 1366, "Incorrect String Value"

I am trying to write a Pandas dataframe into MySQL database, and have used the following code:

engine = sqlalchemy.create_engine("mysql+pymysql://root:password@localhost/skills?charset=utf8mb4")
connection = engine.connect
dataframe.head().to_sql('indeed_resumes', engine, flavor='mysql', if_exists='replace',index=True)

However, I get the following error:

InternalError: (1366, "Incorrect string value: '\\xE1\\xBB\\x99i\\x0AO...' for column 'work' at row 5")

The datatype of the MySQL table is as follows:

 mysql> desc indeed_resumes;
    +-----------+------------+------+-----+---------+-------+
    | Field     | Type       | Null | Key | Default | Extra |
    +-----------+------------+------+-----+---------+-------+
    | index     | bigint(20) | YES  | MUL | NULL    |       |
    | certs     | text       | YES  |     | NULL    |       |
    | contact   | text       | YES  |     | NULL    |       |
    | education | text       | YES  |     | NULL    |       |
    | headline  | text       | YES  |     | NULL    |       |
    | info      | text       | YES  |     | NULL    |       |
    | skills    | text       | YES  |     | NULL    |       |
    | summary   | text       | YES  |     | NULL    |       |
    | updated   | text       | YES  |     | NULL    |       |
    | work      | text       | YES  |     | NULL    |       |
    +-----------+------------+------+-----+---------+-------+
    10 rows in set (0.00 sec)

My data consists of very long strings (sometimes around 3000 characters), so this could be causing the error. Any suggestions?

Upvotes: 5

Views: 3434

Answers (1)

runawaykid
runawaykid

Reputation: 1441

I seem to have resolved this issue. It looks like I also needed to change the database encoding using the following commands.

ALTER DATABASE skills CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
ALTER TABLE indeed_resumes CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

From https://mathiasbynens.be/notes/mysql-utf8mb4:

"Turns out MySQL’s utf8 charset only partially implements proper UTF-8 encoding. It can only store UTF-8-encoded symbols that consist of one to three bytes; encoded symbols that take up four bytes aren’t supported.

Luckily, MySQL 5.5.3 (released in early 2010) introduced a new encoding called utf8mb4 which maps to proper UTF-8 and thus fully supports Unicode, including astral symbols."

Upvotes: 11

Related Questions