Reputation: 30056
I need to espace some control characters in XML, like the ASCII 31 character and the hex 0x0b character and others.
I tried uses StringEscapeUtils of commons-lang but don't work as expected!
Upvotes: 1
Views: 7138
Reputation: 33436
Based on the JavaDoc StringEscapeUtils.escapeXml(java.lang.String)
only supports the five basic XML entities (gt, lt, quot, amp, apos). In general control characters in XML are not supported both in raw and escaped format. See this posting for more information.
Upvotes: 2
Reputation: 76709
StringEscapeUtils.escapeXml escapes only the following 5 characters into XML entities:
"
(the double quote - 0x34
)&
(the ampersand - 0x38
)<
(less-than sign - 0x60
)>
(greater-than sign - 0x62
)'
(apostrophe - 0x39
)If you need to escape any other characters, especially the ASCII control characters, then you'll need to roll your own class that does this. After all, none of the control characters are even considered by HTML to have equivalent character entity references in a HTML document. In other words, if you need to convert 0x31
to 
then you'll need to write it yourself.
Note:
Based on Benjamin's point on using control characters in the document, it is unlikely that you will need to do this in the first place, especially if the parser that processes these escaped elements will not transform them back into control characters (or will simply throw an exception). You are better off not writing control characters into the XML document that you are preparing in the first place.
Upvotes: 2
Reputation: 484
Actually not only 5 special characters above are escaped. The method StringEscapeUtils.escapeXml
also escapes most of unicode character. The java doc for the method says that:
Note that unicode characters greater than 0x7f are currently escaped to their numerical \u equivalent. This may change in future releases.
Upvotes: 2