CJ7
CJ7

Reputation: 23275

How can I get XML comments to appear in generated XSD?

If I have XML documents that follow the following pattern, how can I generate XSDs for these documents but convert the comments to annotation tags?

<ELEMENT1>

  <!--
  documentation text ....
  -->
  <ELEMENT2>
    <ELEMENT3>ABC</ELEMENT3>
  </ELEMENT2>

  <!--
  documentation text ....
  -->
  <ELEMENT4>
    <ELEMENT5>0534564117</ELEMENT5>
    <ELEMENT6>123456</ELEMENT6>
    <ELEMENT7>090314b4-fc7d-42c5-b382-a5b745671ee32b</ELEMENT7>
  </ELEMENT4>

</ELEMENT1>

Upvotes: 1

Views: 4199

Answers (1)

C. M. Sperberg-McQueen
C. M. Sperberg-McQueen

Reputation: 25034

It's hard to tell from your question what you are finding difficult here. What have you tried? What have you thought about but not yet tried for some reason or another? What was that reason?

There are a number of tools for generating a document grammar (in the form of a DTD, Relax NG schema, or XSD schema) from a collection of XML documents; a search for "grammar induction" or "grammar inference" and "XML" will turn up some tools (on Stack Overflow, a search for XML Trang or xml xsd.exe will produce a number of hits), and I believe it is not uncommon for XML-oriented development environments to include functionality for generating schemas from samples (often using the same open-source tools under the hood). It is the nature of such tools, however, to try to infer a general grammar from several samples, which means it it unlikely that the comments in any one of the input files will be interesting or important enough to merit inclusion in the schema. So you're unlikely to find an off the shelf grammar inference tool with a switch to make it copy comments in the input into annotations in the output.

The heading of your question, on the other hand, seems to make it sound as if you already know how to generate an XSD schema from your XML input and you are only seeking advice on how to make the comments in the XML accessible to the process that is generating the schema. In that case, the answer is: use a programming language, or an XML parser interface, that gives you access to the comments. XSLT or SAX2 are obvious choices. (On the other hand, it's unlikely that anyone who knows XML well enough to know how to generate a useful schema from a collection of XML instances could be in any doubt about how to read comments in XML input. So I guess this is not really the issue.)

Your alternatives include:

  • Use a SAX2 interface (or any other parser API that exposes comments) to read the XML instance and generate the schema, in the programming language of your choice.
  • Write the schema generator in XSLT, and use <xsl:template match="comment()"> ... </xsl:template> templates to handle comments in the input and generate xs:documentation elements in the XSD schema document produced as output.
  • Use an off-the-shelf schema generator (say, Trang) to generate a schema document for your data, and then write an XSLT stylesheet or SAX filter to re-read the XSD schema document and your XML input, extract the comments in the XML input, identify the declarations the comments relate to, and write out the schema document again with xs:annotation and xs:documentation elements inserted at appropriate points, containing the comments from the XML input.

Upvotes: 2

Related Questions