Reputation: 189
I have a script in PHP that stores values to a MySQL database from a web store. The store allows customers to leave a message which can create havoc when they use emojis. To prevent these characters from breaking my script I've used FILTER_SANITIZE_STRING
and FILTER_FLAG_STRIP_HIGH
on all my strings prior to sending them all to MySQL.
This works well except for when I display it again in a Java program I've written I'll have things like "I'm"
instead of "I'm"
.
Is there a way to have Java find and convert the ASCII values back into characters?
My current plan of attack is to have a function that takes each relevant string column, examines each word looking for &#
, finds the position of the simi-colon after the &#
, replaces that value with the corresponding ASCII character, and returns the new string.
It's doable, but I'm hoping there is an existing means to do this without re-inventing the wheel.
Edit: Thank you to @rzwitserloot for pointing me in the right direction, for anyone who sees this and does not read my comment in his answer, I ended up using JSoup. Here is a snippet of the final code section related to this on the Java side for anyone else working through this:
// Connect method opens a connection to the MySQL server
connect();
// Query the MySQL server
resultSet = statement.executeQuery("select * from order_tracking order by DateOrdered");
// If there is any result, iterate through them until the end is reached.
while (resultSet.next()) {
// Add each returned row into the list to send to the table
Jsoup.parse(resultSet.getString(2)).text()
.
.
.
}
The .text()
at the end of the Jsoup.parse(String)
gets rid of the html formatting (i.e. <Head><Body>
etc) that Jsoup automatically throws in and returns only the text portion with the &
(or whatever ascii value it might be) properly formatted.
Thanks!
Upvotes: 1
Views: 431
Reputation:
You have HTML-escaped entities in your database. This isn't ideal, but it's easy to reverse. Pass the string to html_entity_decode()
to reverse this process.
Upvotes: 0
Reputation: 103510
The best solution is to just fix the initial bit: Of course databases (and mysql in general) can store emojis, but mysql is weird. utf8 isn't utf8, it's misnamed. The real utf8 in mysql is called utf8mb4. Use that encoding and you can store smiley's just fine.
If that option somehow doesn't work for you, your strings are HTML encoded. The solution is to HTML-decode them. Java doesn't ship with one out of the box, you need a dependency. There's this, for example: http://commons.apache.org/proper/commons-lang/javadocs/api-3.1/org/apache/commons/lang3/StringEscapeUtils.html#unescapeHtml4(java.lang.String)
Upvotes: 1