user1574322
user1574322

Reputation: 11

How to replace invalid characters in XML string?

I have a string which was encoded by UTF-16. When parsing using javax.xml.parsers.DocumentBuilder, I got an error like this:

Character reference "&#x0" is an invalid XML character

Here is the code I used to parse the XML:

InputSource inputSource = new InputSource();
inputSource.setCharacterStream(new StringReader(xmlString));
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder parser = factory.newDocumentBuilder();
org.w3c.dom.Document document = parser.parse(inputSource);

My question is, how to replace the invalid characters by (space)?

Upvotes: 1

Views: 8539

Answers (3)

srini.venigalla
srini.venigalla

Reputation: 5145

StringEscapeUtils()

escapeXml

public static void escapeXml(java.io.Writer writer,
                             java.lang.String str)
                      throws java.io.IOException

Escapes the characters in a String using XML entities.

For example: "bread" & "butter" => "bread" & "butter".

Supports only the five basic XML entities (gt, lt, quot, amp, apos). 
Does not support DTDs or external entities.

Note that unicode characters greater than 0x7f are currently escaped to their 
numerical \\u equivalent. This may change in future releases.

Parameters:
    writer - the writer receiving the unescaped string, not null
    str - the String to escape, may be null 
Throws:
    java.lang.IllegalArgumentException - if the writer is null 
    java.io.IOException - if there is a problem writing
See Also:
    unescapeXml(java.lang.String)

Upvotes: 0

S N
S N

Reputation: 21

You just need to use String.replaceAll and pass the pattern of invalid characters.

Upvotes: 1

Kowser
Kowser

Reputation: 8261

You are trying to parse an invalid xml entity and this is what raising exception. It seems you need not to worry about UTF-16 for your situation.

Find some explanation and example here.

As an example, it is not possible to use & character for a valid xml, we need to use & instead. Here & is the xml entity.

Assuming above example should be self explanatory to understand what xml entity is.

As I understand there are some xml entity which is not valid. But no worry again. it is possible to declare & add new xml entity. Take a look at the above article for more detail.


EDIT: Assuming there is & character making the xml invalid.

Upvotes: 0

Related Questions