basZero
basZero

Reputation: 4284

Safely prepare a String with Emoji Icons in Java for XML and XSLT Transformation

You get a string, containing any kind of characters (UTF-8) including special characters like emoticons/emoji 👍 🏁. You have to generate an XML Element containing that received string and pass it to an XSLT Transformator Engine.

As I get Transformation Errors, I wonder how the Java code could process the string before inserting it into the final XML so that the XSLT Transformation will not fail.

What I currently have in Java is this:

String inputValue = ...; // you get this string by an external client
Element target = ...; // element of an XML where you have to add the string
String xml10pattern = "[^"
                    + "\u0009\r\n"
                    + "\u0020-\uD7FF"
                    + "\uE000-\uFFFD"
                    + "\ud800\udc00-\udbff\udfff"
                    + "]"; // this removes the illegal characters in XML
inputValue = inputValue.replaceAll(xml10pattern, "");
target.setAttribute("text", inputValue);

But still, is something missing in order to make it more safe?

Upvotes: 0

Views: 970

Answers (2)

basZero
basZero

Reputation: 4284

A cheap possibility would be to strip off all non ASCII characters so that you just pass a clean text string to it (but with linebreaks etc.):

String inputValue = ...; // you get this string by an external client
Element target = ...; // element of an XML where you have to add the string
String xml10pattern = "[^"
                    + "\u0009\r\n"
                    + "\u0020-\uD7FF"
                    + "\uE000-\uFFFD"
                    + "\ud800\udc00-\udbff\udfff"
                    + "]"; // this removes the illegal characters in XML
inputValue = inputValue.replaceAll(xml10pattern, "");
inputValue = inputValue.replaceAll("[^\\x00-\\xFF]", "");
target.setAttribute("text", inputValue);

Any thoughts on this?

Upvotes: 0

Joop Eggen
Joop Eggen

Reputation: 109593

Apache commons library has StringEscapeUTils.escapeXML(string). This allows to have & in your attribute.

Upvotes: 1

Related Questions