Reputation: 954
Is it any way to avoid preserve whitespaces in DOM (Whichever java library) ?
I have an XML file validated by an XSD schema. By this schema, only <text>
element contains texts. Another element contains only element nodes. When I edit XML file, for most visibility, I have several types of whitespaces like tab, blank, carriage return,...
How I can to parse my XML (without xslt, only java libraries) without preserve all whitespaces not authorized by schema ?
Upvotes: 1
Views: 2562
Reputation: 167401
https://docs.oracle.com/javase/7/docs/api/javax/xml/parsers/DocumentBuilderFactory.html#setIgnoringElementContentWhitespace(boolean) suggests there is a setting that "requires the parser to be in validating mode" (https://docs.oracle.com/javase/7/docs/api/javax/xml/parsers/DocumentBuilderFactory.html#setSchema(javax.xml.validation.Schema)) and then supports ignoring the white space in element only content models.
Here is an example, given the Java code
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
dbf.setIgnoringElementContentWhitespace(true);
Schema schema = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI).newSchema(new File("schema1.xsd"));
//dbf.setSchema(schema);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse("file1.xml");
System.out.println(doc.getDocumentElement().getChildNodes().getLength());
with a sample file
<root>
<item>a</item>
<item>b</item>
</root>
the number of child nodes output is 5, now when I remove the comment from
dbf.setSchema(schema);
and have a schema defining element only contents for the root
element with e.g.
<xs:schema version="1.0"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified">
<xs:element name="root">
<xs:complexType>
<xs:sequence maxOccurs="unbounded">
<xs:element name="item" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
the output for the child nodes is only 2.
Upvotes: 5