Valeriane
Valeriane

Reputation: 954

DOM avoid preserve whitespaces

Is it any way to avoid preserve whitespaces in DOM (Whichever java library) ?

I have an XML file validated by an XSD schema. By this schema, only <text> element contains texts. Another element contains only element nodes. When I edit XML file, for most visibility, I have several types of whitespaces like tab, blank, carriage return,...

How I can to parse my XML (without xslt, only java libraries) without preserve all whitespaces not authorized by schema ?

Upvotes: 1

Views: 2562

Answers (1)

Martin Honnen
Martin Honnen

Reputation: 167401

https://docs.oracle.com/javase/7/docs/api/javax/xml/parsers/DocumentBuilderFactory.html#setIgnoringElementContentWhitespace(boolean) suggests there is a setting that "requires the parser to be in validating mode" (https://docs.oracle.com/javase/7/docs/api/javax/xml/parsers/DocumentBuilderFactory.html#setSchema(javax.xml.validation.Schema)) and then supports ignoring the white space in element only content models.

Here is an example, given the Java code

    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    dbf.setNamespaceAware(true);
    dbf.setIgnoringElementContentWhitespace(true);

    Schema schema = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI).newSchema(new File("schema1.xsd"));
    //dbf.setSchema(schema);

    DocumentBuilder db = dbf.newDocumentBuilder();

    Document doc = db.parse("file1.xml");

    System.out.println(doc.getDocumentElement().getChildNodes().getLength());

with a sample file

<root>
    <item>a</item>
    <item>b</item>
</root>

the number of child nodes output is 5, now when I remove the comment from

dbf.setSchema(schema);

and have a schema defining element only contents for the root element with e.g.

<xs:schema version="1.0"
           xmlns:xs="http://www.w3.org/2001/XMLSchema"
           elementFormDefault="qualified">

    <xs:element name="root">
        <xs:complexType>
            <xs:sequence maxOccurs="unbounded">
                <xs:element name="item" type="xs:string"/>
            </xs:sequence>
        </xs:complexType>
    </xs:element>

</xs:schema>

the output for the child nodes is only 2.

Upvotes: 5

Related Questions