Reputation: 501
While processing an xml with xslt, i get the following error but i could not see those characters in the xml
Character reference "" is an invalid XML character.
Character reference "" is an invalid XML character.
Character reference "" is an invalid XML character.
Character reference "" is an invalid XML character.
Character reference "" is an invalid XML character.
Character reference "" is an invalid XML character.
Character reference "" is an invalid XML character.
Please advise.
The xml is formed from csv text file that has utf 8 character encoding.
Upvotes: 1
Views: 5498
Reputation: 163262
These character references are legal in XML 1.1 but not in XML 1.0. Check whether the XML parser you are using supports XML 1.1, and whether the XML declaration at the top of the file specifies <?xml version="1.1"?>
.
Upvotes: 2
Reputation: 29022
These are non-printable ASCII control codes ranging from 0 or 1 to 31 decimal in the ASCII table. They are invisible in a text editor so you don't see them. If you can switch your editor to hex mode, you'll find values like 04h
=4, 12h
=18d, and so on next to normal UTF-8(or other)-encodings like 41h
for 'A', 42h
for 'B'.
So the easiest way to get rid of them is using a tool that filters these out. Using linux you could use the approach described here.
Upvotes: 1
Reputation: 310
The number after &#
is an ASCII code in decimal format (&#x
would specify code in hexadecimal format). These codes, 16, 4, 18, etc. don't specify any printable character, but they are control characters that are usually not visible in text editors by default. These characters or actually bytes are not allowed in XML (with few exceptions), so your XML is invalid.
The CSV file probably contained these illegal bytes and the XML was formed without any kind of content validation (i.e. the contents of the CSV file have been just copied byte-by-byte to the XML).
Here are some options:
Upvotes: 3
Reputation: 66714
Those are control characters. Control characters and characters out of the Unicode ranges are not allowed. This means also that calling for example the character entity 
is forbidden.
see XML recommendation 1.0, §2.2 Characters
The global list of allowed characters is:
[2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */
Upvotes: 1