Reputation: 183
Running into an issue when using EncodeForHTML for certain characters (Emojis in this case)
The text in this case is: ⌛️a😊b👍c😟 💥🍉🍔 💩 🤦🏼♀️🤦🏼♀️🤦🏼♀️ 😘
Now if I just a straight output
<cfoutput>#txt#</cfoutput>
It displays correctly, no issues, but if I use EncodeForHTML first
<cfoutput>#EncodeForHTML(txt)#</cfoutput>
I get this ⌛️a��b��c�� ������ �� ����♀️����♀️����♀️ ��
I tested it with EncodeForXML & esapiEncode as well to be sure; all are giving me the same result. I've verified the encoding settings in Lucee are UTF-8, and the meta charset tag is also set to UTF-8. I can't find any documenation re: EncodeForHTML saying if it make any changes to the character encoding, if it requires the character encoding to be something specific, or if it has any known issues with emojis or certain code points.
I appreciate any help or clarification anyone can provide.
Edit: Thank you everyone. Wish I could accept multiple answers.
Upvotes: 2
Views: 462
Reputation: 1472
Yes, ESAPI 2.2.0.0 addressed the issue of not correctly encoding non-BMP characters (see https://github.com/ESAPI/esapi-java-legacy/issues/300) as part of PR #413 that James mentioned above.
But I just uploaded release ESAPI 2.2.1.0-RC1 (release candidate 1) to Maven Central early this morning and hope to have an official 2.2.1.0 release out by next weekend, so if you are going to put in a ticket with Adobe for fix this with an updated version of ESAPI, I'd wait another week and then tell them to update to 2.2.1.0.
Upvotes: 2
Reputation: 4475
I was required to sanitize emojis in order ensure that third-party content was cross-compatible with external services. Some of the content contained emojis and was causing export/import problems. I wrote a ColdFusion wrapper for the emoji-java library to identify, sanitize and convert emojis.
https://github.com/JamoCA/cf-emoji-java
For example, the parseToAliases() function "replaces all the emoji's unicodes found in a string by their aliases".
emojijava = new emojijava();
emojijava.parseToAliases('I like 🍕'); // I like :pizza:
To "encode" you could use either the parseToHtmlDecimal() or parseToHtmlHexadecimal() functions prior to using EncodeForHTML().
emojijava = new emojijava();
test = emojijava.parseToHtmlDecimal('I like 🍕'); // I ❤️ 🍕
EncodeForHTML(test);
Upvotes: 5
Reputation: 11120
At the time of this writing, ColdFusion's latest version is 2018 update 9
In turn, it uses ESAPI 2.1.1
Recent release notes don't mention Emoji,
https://github.com/ESAPI/esapi-java-legacy/tree/develop/documentation
But they do mention in Pull request 413
"Fixing ESAPI's inability to handle non-BMP codepoints."
This dates from 2017
https://github.com/ESAPI/esapi-java-legacy/pull/413
So based on all this information, I would recommend doing both of the following
Try using ESAPI directly. This is how it was done before ESAPI was added to CF. This issue may or may not still exist in ESAPI
Put in a ticket with Adobe to update this library.
Upvotes: 3