Reputation: 23275
If I have XML
documents that follow the following pattern, how can I generate XSDs
for these documents but convert the comments to annotation tags?
<ELEMENT1>
<!--
documentation text ....
-->
<ELEMENT2>
<ELEMENT3>ABC</ELEMENT3>
</ELEMENT2>
<!--
documentation text ....
-->
<ELEMENT4>
<ELEMENT5>0534564117</ELEMENT5>
<ELEMENT6>123456</ELEMENT6>
<ELEMENT7>090314b4-fc7d-42c5-b382-a5b745671ee32b</ELEMENT7>
</ELEMENT4>
</ELEMENT1>
Upvotes: 1
Views: 4199
Reputation: 25034
It's hard to tell from your question what you are finding difficult here. What have you tried? What have you thought about but not yet tried for some reason or another? What was that reason?
There are a number of tools for generating a document grammar (in the form of a DTD, Relax NG schema, or XSD schema) from a collection of XML documents; a search for "grammar induction" or "grammar inference" and "XML" will turn up some tools (on Stack Overflow, a search for XML Trang
or xml xsd.exe
will produce a number of hits), and I believe it is not uncommon for XML-oriented development environments to include functionality for generating schemas from samples (often using the same open-source tools under the hood). It is the nature of such tools, however, to try to infer a general grammar from several samples, which means it it unlikely that the comments in any one of the input files will be interesting or important enough to merit inclusion in the schema. So you're unlikely to find an off the shelf grammar inference tool with a switch to make it copy comments in the input into annotations in the output.
The heading of your question, on the other hand, seems to make it sound as if you already know how to generate an XSD schema from your XML input and you are only seeking advice on how to make the comments in the XML accessible to the process that is generating the schema. In that case, the answer is: use a programming language, or an XML parser interface, that gives you access to the comments. XSLT or SAX2 are obvious choices. (On the other hand, it's unlikely that anyone who knows XML well enough to know how to generate a useful schema from a collection of XML instances could be in any doubt about how to read comments in XML input. So I guess this is not really the issue.)
Your alternatives include:
<xsl:template match="comment()"> ... </xsl:template>
templates to handle comments in the input and generate xs:documentation
elements in the XSD schema document produced as output.xs:annotation
and xs:documentation
elements inserted at appropriate points, containing the comments from the XML input.Upvotes: 2