Reputation: 838
I am experiencing some issue with MySQL encoding parsing data and then storing data from the Twitter API.
The tweet that struggles to get stored into the db is:
INSERT INTO `statuses` (`status_id`,`text`) VALUES('93332222111111','The beers are on me in this case!�')
The �
character is this one.
whereas the following got stored successfully:
INSERT INTO `statuses` (`status_id`,`text`) VALUES('485072105225921','RT @someone: 🔥 Don\'t forget to index timestamp columns like \"created_at\" if you query against them.\n\ne.g.: ORDER BY created_at\ne.g.: WH')
Let's have a look into the character set:
SHOW VARIABLES LIKE 'character_set%'
which brings back
character_set_client utf8mb4
character_set_connection utf8mb4
character_set_database utf8mb4
character_set_filesystem binary
character_set_results utf8mb4
character_set_server utf8
character_set_system utf8
character_sets_dir /usr/local/Cellar/mysql/5.7.18_1/share/mysql/charsets/
Anything obvious that I am missing?
Update: Also the following if block:
if utf8.ValidString(strings.Join(values, ",")) == false {
fmt.Println(strings.Join(values, ","))
}
returns:
'The beers are on me in this case!�','943304851980963841'
Upvotes: 0
Views: 5259
Reputation: 142433
Do not use utf16 for anything.
Use MySQL's CHARACTER SET utf8mb4
; it is equivalent to the outside world's UTF-8
, and includes the characters that begin with hex F0
. (MySQL's utf8
does not include them.)
🔥
, hex F09F94A5
needs utf8mb4. Ditto for 🍻
, hex F09F8DBB
.
�
indicates that something is already messed up. That is, the INSERT
was given a bad character.
Please provide SELECT HEX(text) ...
to see if you get one of those hex values.
Also see "black diamond" in Trouble with UTF-8 characters; what I see is not what I stored
Upvotes: 0
Reputation: 3782
Two suggestions possible to solve your problem:
UTF16
charset;utf8mb4
as char set and utf8mb4_unicode_ci
as collation.You can use the following code as an example, extracted from an online tutorial:
# For each database:
ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
# For each table:
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
# For each column:
ALTER TABLE table_name CHANGE column_name column_name VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
As your problem is not with the database, you have to use the right code representing the image. I suggest you use "emoji-java": a lightweight java library that helps you use Emojis in your java applications.
An example:
String str = "An :grinning:awesome :smiley:string 😄with a few :wink:emojis!";
String result = EmojiParser.parseToUnicode(str);
System.out.println(result);
// Prints:
// "An 😀awesome 😃string 😄with a few 😉emojis!"
Another edit: You just tell now the language you are using: Go. In this case, you can take a look at Go-emoji here. Or even this another emoji project.
Upvotes: 1