user1328572
user1328572

Reputation: 599

Can anyone give me a list of invalid XML characters

Following is the bug: http://java.net/jira/browse/JAXB-614

Above said bug, recommends us to use the resolution mentioned in the following link: http://blog.lesc.se/2009/03/escape-illegal-characters-with-jaxb-xml.html

The resolution list 31 codes:

final String escapeString = "\u0000\u0001\u0002\u0003\u0004\u0005" +                
    "\u0006\u0007\u0008\u000B\u000C\u000E\u000F\u0010\u0011\u0012" +            
    "\u0013\u0014\u0015\u0016\u0017\u0018\u0019\u001A\u001B\u001C" +               
    "\u001D\u001E\u001F\uFFFE\uFFFF";

Now, my question is, can I get the actual characters in ASCII for the above mentioned codes?

Upvotes: 3

Views: 7800

Answers (6)

user373533
user373533

Reputation: 101

The official list of allowed characters in an XML document is defined as follows:

Char       ::=      #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] |
[#x10000-#x10FFFF]  /* any Unicode character, excluding the surrogate
blocks, FFFE, and FFFF. */

However, in addition to that definition the XML Recommendation also suggests avoiding what it terms "compatibility characters" which it defines as follows:

[#x7F-#x84], [#x86-#x9F], [#xFDD0-#xFDEF],
[#x1FFFE-#x1FFFF], [#x2FFFE-#x2FFFF], [#x3FFFE-#x3FFFF],
[#x4FFFE-#x4FFFF], [#x5FFFE-#x5FFFF], [#x6FFFE-#x6FFFF],
[#x7FFFE-#x7FFFF], [#x8FFFE-#x8FFFF], [#x9FFFE-#x9FFFF],
[#xAFFFE-#xAFFFF], [#xBFFFE-#xBFFFF], [#xCFFFE-#xCFFFF],
[#xDFFFE-#xDFFFF], [#xEFFFE-#xEFFFF], [#xFFFFE-#xFFFFF],
[#x10FFFE-#x10FFFF].

See: https://www.w3.org/TR/xml/#charsets

Upvotes: 0

mricci
mricci

Reputation: 963

I've written a method that returns a List<Character> containing all of the invalid XML characters. This helped me with a unit test for a regular expression that stripped these characters. You can view the gist here

In case the above link stops working, here is the code:

return IntStream.rangeClosed(0, 65536).filter(XMLChar::isInvalid).mapToObj(c -> new Character((char) c))
            .collect(Collectors.toList());

Upvotes: 1

Mark O&#39;Connor
Mark O&#39;Connor

Reputation: 77971

Search google for "java unicode". Example result as follows:

http://www.ssec.wisc.edu/~tomw/java/unicode.html

Unicode is designed to cover all character sets. The original "ASCII" was only good for North America. Java itself has unicode support built it, but there are still lots of character encoding "gotchas" to discover :-)

Upvotes: 0

Dave Newton
Dave Newton

Reputation: 160191

ASCII? No, ASCII goes up to 255. The entities 0x1F and below are all control characters.

http://www.utf8-chartable.de/

Upvotes: 0

Has QUIT--Anony-Mousse
Has QUIT--Anony-Mousse

Reputation: 77464

If you want to store binary data in XML, it makes some sense to use e.g. Base64 encoding. I don't think substituting them with the same "invalid" character is the best approach.

Upvotes: 1

SLaks
SLaks

Reputation: 887469

None of those characters are printable.

Pasting that string in a Javascript console gives "�".

Upvotes: 1

Related Questions