bkn
bkn

Reputation: 21

How to persist text containing html emoji in mysql through hibernate

I am adding the emojis using their html codes (example: & # 1 2 8 0 7 0 ; ) using a javascript function in a contenteditable html element which displays the emoji (example: 👆)

I then retrieve the content of the element with

var content = el.innerText;

I add the content in a json

var myjson = {"content" : content};

I send the json through ajax with content-type being

application/json

In, the backend i convert the json to a pojo using Gson, then persist it in the database using hibernate

In the servlet, using system.out.println, the content displays as ð in the db, a select query displays it as ð (with small numbers inside the rectangles)

I spent hours searching, setting the character set as utf8mb4 in the DB as well as the hibernate config file...example as described in the following links

Caused by: java.sql.SQLException: Unsupported character encoding 'utf8mb4'

How to store Emoji Character in MySQL Database

I'm a bit lost with it. My need is to persist a text containing emoji and be able to retrieve it(through hibernate or simple jdbc) and display it in a webpage.

What would be the best approach?

Upvotes: 1

Views: 1054

Answers (2)

Yogesh Chawla
Yogesh Chawla

Reputation: 35

Try this at database level:

Modify databases, tables, and columns Change the character set and collation properties of the databases, tables, and columns to use utf8mb4 instead of utf8.

For each database:

ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;

For each table:

ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

For each column:

ALTER TABLE table_name CHANGE column_name column_name VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Check the maximum length of columns and index keys

show VARIABLES like 'ver%'

SHOW VARIABLES WHERE Variable_name LIKE 'character_set_%' OR Variable_name LIKE 'collation%';

====result should be==== character_set_client utf8mb4 character_set_connection utf8mb4 character_set_database utf8 character_set_filesystem binary character_set_results utf8mb4 character_set_server utf8 character_set_system utf8 collation_connection utf8mb4_general_ci collation_database utf8_general_ci collation_server utf8_general_ci

Repair and optimize all tables For each table

REPAIR TABLE table_name; OPTIMIZE TABLE table_name; $ mysqlcheck -u root -p –auto-repair –optimize –all-databases

Try this at database level. Your problem might be linked to a database.

Upvotes: 1

Y2020-09
Y2020-09

Reputation: 1

Here, i got flustered because of the Java-Script which I have gotten away from in the past while... Play with this page, to see how Emoji's are represented.

CodePoint is not a change in representation, it is a change in interpretation... There is simply not enough space to explain eveything about escape sequences and character representation in a single Stack Oveflow answer... This is going to HiLite the major issues when retrieving an Emoji from an HTML Page.

<HTML>
<HEAD>
<TITLE>Emoji</TITLE>
<SCRIPT TYPE='text/javascript'>
function show()
{
    // Prints the EMOJI ITSELF... (I can't enter that here)
    var s = document.getElementById("thediv").innerHTML;
    alert(s);

    // Prints characters:  � � 
    //
    // NOTE: The UNICODE characters (55357 and 56390) are unprintable
    //       Therefore the question mark (�) is printed
    var s2 = " ";
    for (var i=0; i < s.length; i++) s2 = s2 + (s.charAt(i)) + ' ';
    alert(s2);
    
    // Prints Characters: 55357 56390 
    //
    // NOTE: This is guaranteed to be a number between 1 and 65,535
    //       which 2^16 - and UNICODE is a 16 bit space.
    var s3 = " ";
    for (var i=0; i < s.length; i++) s3 = s3 + (s.charCodeAt(i)) + ' ';
    alert(s3);
    
    // Prints Characters: 128070 56390 
    //
    // NOTE: here, the SECOND NUMBER is IRRELEVANT... It is "incorporated" in the first..
    //       As mentioned, interpreting two characters in a row using
    //       CODE-POINT means, the second number and the first are "Chomped" together.
    // The second number is called the "High Surrogate"
    var s4 = " ";
    for (var i=0; i < s.length; i++) s4 = s4 + (s.codePointAt(i)) + ' ';
    alert(s4);

    // Prints the String "&#128070;"
    //
    // YOU SHOULD PROBABLY BE SENDING THIS TO YOUR DATABASE...
    var s5 = "&#" + s.codePointAt(0) + ";";
    alert(s5);
}
</SCRIPT>
</HEAD>
<BODY>

<H1>Divider</H1>
<DIV ID="thediv">&#128070;</DIV>
<BR /><BR /><BUTTON onclick="show();">Click Me</BUTTON>
</BODY>
</HTML>

The above code will send the following java-script alerts:

document.getElementById("thediv").innerHTML; The rendered Emoji

s.charAt(i) The individual characters

s.charCodeAt(i) The Character Codes

s.codePointAt(i) The Code points

"&#" + s.codePointAt(0) + ";"; The HTML Escaped Emoji

Upvotes: 0

Related Questions