Jacob FW
Jacob FW

Reputation: 183

Coldfusion/Lucee Encoding Issue When Using EncodeForHTML

Running into an issue when using EncodeForHTML for certain characters (Emojis in this case)

The text in this case is: ⌛️a😊b👍c😟 💥🍉🍔 💩 🤦🏼‍♀️🤦🏼‍♀️🤦🏼‍♀️ 😘

Now if I just a straight output

<cfoutput>#txt#</cfoutput>

It displays correctly, no issues, but if I use EncodeForHTML first

<cfoutput>#EncodeForHTML(txt)#</cfoutput>

I get this ⌛️a��b��c�� ������ �� ����‍♀️����‍♀️����‍♀️ ��

I tested it with EncodeForXML & esapiEncode as well to be sure; all are giving me the same result. I've verified the encoding settings in Lucee are UTF-8, and the meta charset tag is also set to UTF-8. I can't find any documenation re: EncodeForHTML saying if it make any changes to the character encoding, if it requires the character encoding to be something specific, or if it has any known issues with emojis or certain code points.

I appreciate any help or clarification anyone can provide.

Edit: Thank you everyone. Wish I could accept multiple answers.

Upvotes: 2

Views: 462

Answers (3)

Kevin W. Wall
Kevin W. Wall

Reputation: 1472

Yes, ESAPI 2.2.0.0 addressed the issue of not correctly encoding non-BMP characters (see https://github.com/ESAPI/esapi-java-legacy/issues/300) as part of PR #413 that James mentioned above.

But I just uploaded release ESAPI 2.2.1.0-RC1 (release candidate 1) to Maven Central early this morning and hope to have an official 2.2.1.0 release out by next weekend, so if you are going to put in a ticket with Adobe for fix this with an updated version of ESAPI, I'd wait another week and then tell them to update to 2.2.1.0.

Upvotes: 2

James Moberg
James Moberg

Reputation: 4475

I was required to sanitize emojis in order ensure that third-party content was cross-compatible with external services. Some of the content contained emojis and was causing export/import problems. I wrote a ColdFusion wrapper for the emoji-java library to identify, sanitize and convert emojis.

https://github.com/JamoCA/cf-emoji-java

For example, the parseToAliases() function "replaces all the emoji's unicodes found in a string by their aliases".

emojijava = new emojijava();
emojijava.parseToAliases('I like 🍕');   // I like :pizza:

To "encode" you could use either the parseToHtmlDecimal() or parseToHtmlHexadecimal() functions prior to using EncodeForHTML().

emojijava = new emojijava();
test = emojijava.parseToHtmlDecimal('I like 🍕');   // I &#10084;️ &#127829;
EncodeForHTML(test);

Upvotes: 5

James A Mohler
James A Mohler

Reputation: 11120

At the time of this writing, ColdFusion's latest version is 2018 update 9

In turn, it uses ESAPI 2.1.1

Recent release notes don't mention Emoji,

https://github.com/ESAPI/esapi-java-legacy/tree/develop/documentation

But they do mention in Pull request 413

"Fixing ESAPI's inability to handle non-BMP codepoints."

This dates from 2017

https://github.com/ESAPI/esapi-java-legacy/pull/413


So based on all this information, I would recommend doing both of the following

  1. Try using ESAPI directly. This is how it was done before ESAPI was added to CF. This issue may or may not still exist in ESAPI

  2. Put in a ticket with Adobe to update this library.

Upvotes: 3

Related Questions