user3607534
user3607534

Reputation: 29

How to store emojis in a MySQL table? Tried everything

I have a CSV file containing tweets with emojis (eg. "Cool! 💕") and I need to import them into a MySQL table in such a way they will be saved/displayed correctly... What do I have to set up and how for a correct import (I mean collation, etc.)?

More details:

I already tried:

Upvotes: 2

Views: 803

Answers (1)

Bill Karwin
Bill Karwin

Reputation: 562230

The table must encode text in character set utf8mb4 to store emojis.

Demo:

mysql> create table no ( t text ) character set=utf8;

mysql> load data local infile 'm.csv' into table no;

mysql> select * from no;
+---------+
| t       |
+---------+
| Cool! ? |
+---------+

So utf8 does not support emojis.

mysql> create table yes ( t text ) character set=utf8mb4;

mysql> load data local infile 'm.csv' into table yes;

mysql> select * from yes;
+------------+
| t          |
+------------+
| Cool! 💕     |
+------------+

But utf8mb4 does support emojis. The difference is that utf8mb4 supports 4-byte encodings, but utf8 doesn't. This is an unfortunate part of MySQL's history, that they didn't implement utf8 originally to support the Supplemental Multilingual Plane of the UTF-8 standard.

Let's see if altering the first table helps.

mysql> alter table no character set utf8mb4;

mysql> load data local infile 'm.csv' into table no;

mysql> select * from no;
+---------+
| t       |
+---------+
| Cool! ? |
| Cool! ? |
+---------+

Why didn't this work? Because alter table ... character set does not convert existing columns. It only changes the table's default character set, which will not be used until the next time we add a column to that table.

We can see that the existing column is still using the old character set:

mysql> show create table no\G
*************************** 1. row ***************************
       Table: no
Create Table: CREATE TABLE `no` (
  `t` text CHARACTER SET utf8mb3  
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

utf8mb3 is the character set that utf8 is an alias for in MySQL 8.0.

To convert existing columns, use:

mysql> alter table no convert to character set utf8mb4;

mysql> show create table no\G
*************************** 1. row ***************************
       Table: no
Create Table: CREATE TABLE `no` (
  `t` mediumtext
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

No try the load again:

mysql> load data local infile 'm.csv' into table no;

mysql> select * from no;
+------------+
| t          |
+------------+
| Cool! ?    |
| Cool! ?    |
| Cool! 💕     |
+------------+

Note that someday, MySQL may change the 'utf8' alias to mean utf8mb4. This is shown in warnings on many of the above usages of 'utf8':

'utf8' is currently an alias for the character set UTF8MB3, but will be an alias for UTF8MB4 in a future release. Please consider using UTF8MB4 in order to be unambiguous.

Upvotes: 2

Related Questions