Reputation: 599
Following is the bug: http://java.net/jira/browse/JAXB-614
Above said bug, recommends us to use the resolution mentioned in the following link: http://blog.lesc.se/2009/03/escape-illegal-characters-with-jaxb-xml.html
The resolution list 31 codes:
final String escapeString = "\u0000\u0001\u0002\u0003\u0004\u0005" +
"\u0006\u0007\u0008\u000B\u000C\u000E\u000F\u0010\u0011\u0012" +
"\u0013\u0014\u0015\u0016\u0017\u0018\u0019\u001A\u001B\u001C" +
"\u001D\u001E\u001F\uFFFE\uFFFF";
Now, my question is, can I get the actual characters in ASCII for the above mentioned codes?
Upvotes: 3
Views: 7800
Reputation: 101
The official list of allowed characters in an XML document is defined as follows:
Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] |
[#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate
blocks, FFFE, and FFFF. */
However, in addition to that definition the XML Recommendation also suggests avoiding what it terms "compatibility characters" which it defines as follows:
[#x7F-#x84], [#x86-#x9F], [#xFDD0-#xFDEF],
[#x1FFFE-#x1FFFF], [#x2FFFE-#x2FFFF], [#x3FFFE-#x3FFFF],
[#x4FFFE-#x4FFFF], [#x5FFFE-#x5FFFF], [#x6FFFE-#x6FFFF],
[#x7FFFE-#x7FFFF], [#x8FFFE-#x8FFFF], [#x9FFFE-#x9FFFF],
[#xAFFFE-#xAFFFF], [#xBFFFE-#xBFFFF], [#xCFFFE-#xCFFFF],
[#xDFFFE-#xDFFFF], [#xEFFFE-#xEFFFF], [#xFFFFE-#xFFFFF],
[#x10FFFE-#x10FFFF].
See: https://www.w3.org/TR/xml/#charsets
Upvotes: 0
Reputation: 963
I've written a method that returns a List<Character>
containing all of the invalid XML characters. This helped me with a unit test for a regular expression that stripped these characters. You can view the gist here
In case the above link stops working, here is the code:
return IntStream.rangeClosed(0, 65536).filter(XMLChar::isInvalid).mapToObj(c -> new Character((char) c))
.collect(Collectors.toList());
Upvotes: 1
Reputation: 77971
Search google for "java unicode". Example result as follows:
http://www.ssec.wisc.edu/~tomw/java/unicode.html
Unicode is designed to cover all character sets. The original "ASCII" was only good for North America. Java itself has unicode support built it, but there are still lots of character encoding "gotchas" to discover :-)
Upvotes: 0
Reputation: 160191
ASCII? No, ASCII goes up to 255. The entities 0x1F and below are all control characters.
If you want details on the Unicode/UTF-8 characters those represent, see any Unicode chart, like:
Upvotes: 0
Reputation: 77464
If you want to store binary data in XML, it makes some sense to use e.g. Base64 encoding. I don't think substituting them with the same "invalid" character is the best approach.
Upvotes: 1
Reputation: 887469
None of those characters are printable.
Pasting that string in a Javascript console gives "�"
.
Upvotes: 1