Java XML Numeric Character References

Question

I am having an issue when I parse an XML document that has numeric character references (ie ). The problem I am running into is that when the document is parsed, the & is replaced with & ; (without the space before the ;), so my parsed document will contain & ;#xA0;. How do I stop this from happening? I have tried using xmlDoc.setExpandEntityReferences(false), but that doesnt seem to change anything.

Here is my code for parsing the document:

public static Document getXmlDoc(File xmlFile) throws ParserConfigurationException, SAXExeption, IOException {
    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    factory.setIgnoringElementContentWhitespace(true);
    factory.setExpandEntityReferences(false);
    DocumentBuilder builder = factory.newDocumentBuilder();
    return builder.parse(xmlFile);
}

Any help would be greatly appreciated.

EDIT:

The XML that is parsed form the above code is modified and then written back to a file. The code to do this is below:

public static File saveXmlDoc(Document xmlDocument, String outputToDir, String outputFilename) throws IOException {
    String outputDir = outputToDir;
    if (!outputDir.endWith(File.separator)) outputDir += File.separator;
    if (!new FIle(outputDir).exists()) new File(outputDir).mkdir();
    File xmlFile = new File(outputDir + outputFilename);
    Transformer transformer = TransformerFactory.newInstance().newTransformer();
    transformer.setOutputProperty(OutputKeys.INDENT, "no");
    StreamResult saveResult = new StreamResult(outputDir + outputFilename);
    DOMSource source = new DOMSource(xmlDocument);
    transformer.transform(source, saveResult);

    return xmlFile;
}

EDIT 2:

Fixed a typo for factory.setIgnoringElementContentWhitespace(true);.

EDIT 3 - My Solution:

Since my reputation is too low to answer my own question, here is the solution I used to fix all of this.

Here are the functions I changed in order to resolve this issue:

To get the XML Document:

    public static Document getXmlDoc(File xmlFile) throws ParserConfigurationException, SAXException, IOException {
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        factory.setIgnoringElementContentWhitespace(true);
        factory.setExpandEntityReferences(false);
        DocumentBuilder builder = factory.newDocumentBuilder();
        return builder.parse(xmlFile);
    }

To save the XML Document:

    public static File saveXmlDoc(Document xmlDocument, String outputToDir, String outputFilename) throws Exception {
        readNodesForHexConversion(xmlDocument.getChildNodes());
        String xml = getXmlAsString(xmlDocument);

        // write the xml out to a file
        Exception writeError = null;
        File xmlFile = null;
        FileOutputStream fos = null;
        try {
            if (!new File(outputToDir).exists()) new File(outputToDir).mkdir();
            xmlFile = new File(outputToDir + outputFilename);
            if (!xmlFile.exists()) xmlFile.createNewFile();
            fos = new FileOutputStream(xmlFile);

            byte[] xmlBytes = xml.getBytes("UTF-8");
            fos.write(xmlBytes);
            fos.flush();
        } catch (Exception ex) {
            ex.printStackTrace();
            writeError = ex;
        } finally {
            if (fos != null) fos.close();
            if (writeError != null) throw writeError;
        }

        return xmlFile;
    }

To convert the XML Document to String:

        public static String getXmlAsString(Document xmlDocument) throws TransformerFactoryConfigurationError, TransformerException {
    DOMSource domSource = new DOMSource(xmlDocument);
    StringWriter writer = new StringWriter();
    StreamResult result = new StreamResult(writer);
    Transformer transformer;
    transformer = TransformerFactory.newInstance().newTransformer();
    transformer.transform(domSource, result);
    return writer.toString();
}

Java XML Numeric Character References

Answers (1)

Related Questions