Reputation: 11
I have a string which was encoded by UTF-16. When parsing using javax.xml.parsers.DocumentBuilder
, I got an error like this:
Character reference "�" is an invalid XML character
Here is the code I used to parse the XML:
InputSource inputSource = new InputSource();
inputSource.setCharacterStream(new StringReader(xmlString));
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder parser = factory.newDocumentBuilder();
org.w3c.dom.Document document = parser.parse(inputSource);
My question is, how to replace the invalid characters by (space)?
Upvotes: 1
Views: 8539
Reputation: 5145
StringEscapeUtils()
escapeXml
public static void escapeXml(java.io.Writer writer,
java.lang.String str)
throws java.io.IOException
Escapes the characters in a String using XML entities.
For example: "bread" & "butter" => "bread" & "butter".
Supports only the five basic XML entities (gt, lt, quot, amp, apos).
Does not support DTDs or external entities.
Note that unicode characters greater than 0x7f are currently escaped to their
numerical \\u equivalent. This may change in future releases.
Parameters:
writer - the writer receiving the unescaped string, not null
str - the String to escape, may be null
Throws:
java.lang.IllegalArgumentException - if the writer is null
java.io.IOException - if there is a problem writing
See Also:
unescapeXml(java.lang.String)
Upvotes: 0
Reputation: 21
You just need to use String.replaceAll and pass the pattern of invalid characters.
Upvotes: 1
Reputation: 8261
You are trying to parse an invalid xml entity
and this is what raising exception. It seems you need not to worry about UTF-16
for your situation.
Find some explanation and example here.
As an example, it is not possible to use &
character for a valid xml
, we need to use &
instead. Here &
is the xml entity.
Assuming above example should be self explanatory to understand what xml entity is.
As I understand there are some xml entity which is not valid. But no worry again. it is possible to declare & add new xml entity
. Take a look at the above article for more detail.
EDIT: Assuming there is &
character making the xml invalid.
Upvotes: 0