Reputation: 4284
You get a string, containing any kind of characters (UTF-8) including special characters like emoticons/emoji 👍 🏁. You have to generate an XML Element containing that received string and pass it to an XSLT Transformator Engine.
As I get Transformation Errors, I wonder how the Java code could process the string before inserting it into the final XML so that the XSLT Transformation will not fail.
What I currently have in Java is this:
String inputValue = ...; // you get this string by an external client
Element target = ...; // element of an XML where you have to add the string
String xml10pattern = "[^"
+ "\u0009\r\n"
+ "\u0020-\uD7FF"
+ "\uE000-\uFFFD"
+ "\ud800\udc00-\udbff\udfff"
+ "]"; // this removes the illegal characters in XML
inputValue = inputValue.replaceAll(xml10pattern, "");
target.setAttribute("text", inputValue);
But still, is something missing in order to make it more safe?
Upvotes: 0
Views: 970
Reputation: 4284
A cheap possibility would be to strip off all non ASCII characters so that you just pass a clean text string to it (but with linebreaks etc.):
String inputValue = ...; // you get this string by an external client
Element target = ...; // element of an XML where you have to add the string
String xml10pattern = "[^"
+ "\u0009\r\n"
+ "\u0020-\uD7FF"
+ "\uE000-\uFFFD"
+ "\ud800\udc00-\udbff\udfff"
+ "]"; // this removes the illegal characters in XML
inputValue = inputValue.replaceAll(xml10pattern, "");
inputValue = inputValue.replaceAll("[^\\x00-\\xFF]", "");
target.setAttribute("text", inputValue);
Any thoughts on this?
Upvotes: 0
Reputation: 109593
Apache commons library has StringEscapeUTils.escapeXML(string)
. This allows to have &
in your attribute.
Upvotes: 1