Reputation: 311
I work with PTC Arbortext Editor which was written originally in the pre-XML (SGML) days of the late 1980s. A Java application uses org.custommonkey.xmlunit to diff XML files.
The diff tool fails to parse files where the files expect (on Windows) a semi-colon-separated list of absolute paths to various catalog file locations wherein it looks for catalog
and/or catalog.xml
files. These may use the CATALOG
directive. There is use of PUBLIC
identifier mapped to paths that are relative to the particular catalog file.
I am parsing XML using this catalog info which may contain file entities as well as XML inclusions.
For some use cases, I can set validating false
and that works (it is reasonable to assume the two files are valid) but for some files I have to read the catalog info to resolve file entities in the XML.
I can ask the user to provide a list of absolute paths to their top-level catalog locations. However I am rather lost selecting a resolver and integrating it into my code.
I am using Java 1.8 but don't mind going to 10 if that would help/simplify. It looks like 9 had some simple support with javax.xml.catalog but isn't in 1.8 or 10.
I can provide my parsing code if that matters, but I'm not stuck on any one parser.
My code is below. I switched from LSParser
to DocumentBuilder
for the sake of setValidating(false)
.
Here are a couple excerpts from one of the files I'd like to be able to work with:
<?xml version="1.0" encoding="UTF-8"?>
<!--Arbortext, Inc., 1988-2016, v.4002-->
<!DOCTYPE Composer PUBLIC "-//Arbortext//DTD Composer 1.0//EN"
"../doctypes/composer/composer.dtd" [
<!ENTITY % stock PUBLIC "-//Arbortext//DTD Fragment - ATI Stock filter list//EN" "../composer/stock.ent">
%stock;
]>
<?Pub Inc?>
<Composer>
<Label>Compose to PDF</Label>
. . .
<Resource>
<Label></Label>
<Documentation></Documentation>&epicGenerator;
&fileSerializer;
&serverProfiler;
&clientProfiler;
&xslTransformer;
&epicSerializer;
&switch;
&errorHandler;
&namespaceFixer;
&atiEventConverter;
&foPropagator;
&extensionHandler;
&ditaPostProcessor;
&ditaStyledElementsTranslator;
&atictFilter;
&applicabilityFilter;
</Resource>
And here are a few lines from one of the catalog files I need to reference:
PUBLIC "-//Arbortext//ENTITIES SAX Event Upstream Loop//EN" "upstreamLoop.ent"
PUBLIC "-//Arbortext//ENTITIES keyRef Resolver//EN" "keyRefResolver.ent"
PUBLIC "-//Arbortext//ENTITIES ATI Change Tracking Filter 1.0//EN" "atictFilter.ent"
PUBLIC "-//Arbortext//ENTITIES Font Filter 1.0//EN" "fontFilter.ent"
PUBLIC "-//Arbortext//ENTITIES Simple Attribute Cascader//EN" "simpleAttrCascader.ent"
I also looked at Validate XML using XSD, a Catalog Resolver, and JAXP DOM for XSLT. I feel like it is unlikely to solve my problem, but could be wrong.
I also reviewed the following web sites:
I have uploaded Java code, directory structure, and XML to http://aapro.net/CatalogTest.zip
It should be possible to add something to my program which accepts a path to the Test/doctypes folder (the folder, not the catalog file therein), and then the CatalogTest.xml file should parse successfully with the "Validate" option the program prompts for. Other (expensive) SGML/XML-aware software can do so. The catalog resolver, once given the absolute path to the Test/doctypes folder, should be able to follow the CATALOG directive in the Test/doctypes/catalog file to the Test/other/forms/catalog file, to the Test/other/forms/forms.dtd. The parser should be able to parse Test/other/forms/forms.dtd and use it to validate Test/CatalogTest.xml.
Really, this whole process should be able to handle such catalog files OR catalog.xml files, and should be able to parse DTD or XSD files, and SGML or XML instances. But I don't actually care about SGML too much; there only a few milspec situations still around that use that in my working environment.
I'd be willing to try more than one resolver and/or parser, or let the user make the selection.
(Also in the aforementioned zip file)
import java.io.File;
import javax.swing.JFileChooser;
import javax.swing.JOptionPane;
import javax.swing.filechooser.FileNameExtensionFilter;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
public class ParseXmlWithCatalog {
public static void main(String[] args) {
int validating = JOptionPane.showOptionDialog(null, "Do you want validation?", "Please choose \"Yes\" for validation",
JOptionPane.YES_NO_OPTION, JOptionPane.QUESTION_MESSAGE, null, null, JOptionPane.YES_OPTION);
parseDoc(getFile(args), validating == JOptionPane.YES_OPTION);
}
private static boolean parseDoc(File inFile, boolean validate) {
if (inFile == null) {
JOptionPane.showMessageDialog(null, "Failure opening input XML.");
}
try {
/*
System.setProperty(DOMImplementationRegistry.PROPERTY, "org.apache.xerces.dom.DOMImplementationSourceImpl");
DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
DOMImplementationLS impl = (DOMImplementationLS) registry.getDOMImplementation("LS");
LSParser builder = impl.createLSParser(DOMImplementationLS.MODE_SYNCHRONOUS, null);
LSParserFilter filter = new InputFilter();
builder.setFilter(filter);
*/
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
if (!validate) {
builderFactory.setValidating(false);
builderFactory.setAttribute("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
}
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document testDoc = builder.parse(inFile.getPath());
System.out.println(testDoc.getFirstChild().getNodeName());
} catch (Exception exc) {
JOptionPane.showMessageDialog(null, "Failure parsing input XML: " + exc.getMessage());
return false;
}
return true;
}
public static File getFile(String[] args) {
if (args.length > 1) {
JOptionPane.showMessageDialog(null, "Too many arguments.");
return null;
}
if (args.length == 1) {
return new File(args[0]);
}
JFileChooser fileChooser = new JFileChooser();
fileChooser.setMultiSelectionEnabled(false);
fileChooser.setDialogTitle("Select 1 XML file");
FileNameExtensionFilter filter = new FileNameExtensionFilter("XML Files", "xml", "ditamap", "dita", "style");
fileChooser.setFileFilter(filter);
int response = fileChooser.showOpenDialog(null);
if (response != JFileChooser.APPROVE_OPTION) {
// aborted
return null;
}
return fileChooser.getSelectedFile();
}
}
Upvotes: 0
Views: 2432
Reputation: 311
I'm posting this sample code because it incorporates the use of org.apache.xml.resolver.tools.CatalogResolver as suggested by mzjn, and successfully works with my sample at http://aapro.net/CatalogTest.zip. That is, if I run it, and answer the first prompt with Yes (I want validation) and the second with the absolute path to the Test\doctypes folder, and then browse to CatalogTest.xml, it successfully follows the CATALOG directive in Test\doctypes to "../other/forms/catalog" which in turn specifies the location of the DTD with: PUBLIC "-//Test//Forms Document Type//EN" "forms.dtd", and tells me my top-level node is called "form".
At this point, I'm going to incorporate this solution into my XML diff program. If I find adjustments are needed to handle a multi-entry catalog path and/or handle a mix of catalog and catalog.xml files, I'll post an update. But that could be a few more weeks (or longer), and I thought this code was to the point where someone might find it helpful.
import java.io.File;
import java.io.FilenameFilter;
import java.util.Map.Entry;
import java.util.Properties;
import javax.swing.JFileChooser;
import javax.swing.JOptionPane;
import javax.swing.filechooser.FileNameExtensionFilter;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.apache.xml.resolver.tools.CatalogResolver;
import org.w3c.dom.Document;
public class ParseXmlWithCatalog {
// Offer end-user the convenience of not having to specify which will be used,
// catalog and/or catalog.xml.
private static FilenameFilter catalogFileFilter = new FilenameFilter() {
@Override
public boolean accept(File dir, String name) {
if (name.equals("catalog") || name.equals("catalog.xml")) {
return true;
} else {
return false;
}
}
};
public static void main(String[] args) {
int validating = JOptionPane.showOptionDialog(null, "Do you want validation?",
"Please choose \"Yes\" for validation", JOptionPane.YES_NO_OPTION, JOptionPane.QUESTION_MESSAGE, null,
null, JOptionPane.YES_OPTION);
if (validating == JOptionPane.YES_OPTION) {
String catPath = JOptionPane.showInputDialog(null,
"Please enter semi-colon-separated list of absolute paths to catalog folders, in desired search order; these are the locations of catalog or catalog.xml files, not the filenames.",
"Enter catalog path", JOptionPane.QUESTION_MESSAGE);
String[] catLocs = catPath.split(";");
StringBuilder sb = new StringBuilder();
for (String catLoc : catLocs) {
File[] catFiles = new File(catLoc).listFiles(catalogFileFilter);
for (File catFile : catFiles) {
if (sb.length() > 0) {
sb.append(";");
}
sb.append(catFile.toURI());
}
}
System.setProperty("xml.catalog.files", sb.toString());
System.out.println(
"Using the following top-level catalog files:\n" + System.getProperty("xml.catalog.files"));
System.setProperty("relative-catalogs", "yes");
}
parseDoc(getFile(args), validating == JOptionPane.YES_OPTION);
}
private static boolean parseDoc(File inFile, boolean validate) {
if (inFile == null) {
JOptionPane.showMessageDialog(null, "Failure opening input XML.");
}
try {
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
if (!validate) {
builderFactory.setValidating(false);
builderFactory.setAttribute("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
}
DocumentBuilder builder = builderFactory.newDocumentBuilder();
CatalogResolver resolver = new CatalogResolver();
builder.setEntityResolver(resolver);
Document testDoc = builder.parse(inFile.getPath());
JOptionPane.showMessageDialog(null,
"The top level node is \"" + testDoc.getFirstChild().getNodeName() + "\"");
} catch (Exception exc) {
JOptionPane.showMessageDialog(null, "Failure parsing input XML: " + exc.getMessage());
return false;
}
return true;
}
public static File getFile(String[] args) {
if (args.length > 1) {
JOptionPane.showMessageDialog(null, "Too many arguments.");
return null;
}
if (args.length == 1) {
return new File(args[0]);
}
JFileChooser fileChooser = new JFileChooser();
fileChooser.setMultiSelectionEnabled(false);
fileChooser.setDialogTitle("Select 1 XML file");
FileNameExtensionFilter filter = new FileNameExtensionFilter("XML Files", "xml", "ditamap", "dita", "style");
fileChooser.setFileFilter(filter);
int response = fileChooser.showOpenDialog(null);
if (response != JFileChooser.APPROVE_OPTION) {
// aborted
return null;
}
return fileChooser.getSelectedFile();
}
}
Upvotes: 0
Reputation: 50947
The Apache XML Commons Resolver supports both OASIS XML Catalogs and the older OASIS TR9401 Catalogs format. See https://xerces.apache.org/xml-commons/components/resolver/.
To enable catalog lookup in your test project, do as follows:
Download XML Commons Resolver from http://xerces.apache.org/mirrors.cgi#binary.
Extract resolver.jar and add it to your classpath.
Create a text file called CatalogManager.properties and put it on your classpath. In this file, add the path to the catalog(s):
catalogs=./doctypes/catalog
The locations of catalog files can also be specifed via the xml.catalog.files
Java system property.
In ParseXmlWithCatalog.java, add an import
statement and create an instance of CatalogResolver
. Set that instance as the parser's EntityResolver
:
import org.apache.xml.resolver.tools.CatalogResolver;
...
CatalogResolver cr = new CatalogResolver();
builder.setEntityResolver(cr);
Upvotes: 1