J. Nicholas
J. Nicholas

Reputation: 105

Streaming xml-to-json

I have a number of very large XML files that I would like to convert to (equally large) JSON files. To do so, I've written an XSLT that converts the XML to the interstitial "XML-to-JSON representation" specified in the XSLT 3.0 specification. I can then call the fn:xml-to-json function.

However, I would like to stream this process such that memory usage remains stable. Is this possible?

Upvotes: 0

Views: 559

Answers (1)

Michael Kay
Michael Kay

Reputation: 163458

Unfortunately it is not possible (either according to the XSLT 3.0 specification from W3C, or in the Saxon implementation) to write a multi-phase streaming transformation within a single stylesheet. Normally there are two ways of writing a multi-phase transformation (that is, a transformation that is the composition of two separate transformation: either the first phase can be invoked as a function, or the first phase can deliver its results in a variable. Neither of these mechanisms allows the intermediate results (the results of the first phase) to be delivered as a streamed document.

(We were aware of this limitation while designing the spec, but decided to leave this out from the requirements, as things were getting quite complicated enough already.)

But it can be done in Saxon, I believe, using a multi-phase transformation implemented using two separate stylesheets chained together. The simplest way of doing this is probably with the s9api interface. Write the second transformation t2.xsl (which simply calls xml-to-json) like this:

<xsl:transform version="3.0" expand-text="yes"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:mode streamable="yes"/>
<xsl:output method="text"/>
<xsl:template match="/">{xml-to-json(.)}</xsl:template>
</xsl:transform>

and then do:

Processor proc = new Processor(true);
Serializer out = proc.newSerializer(new File("out.xml");
XsltCompiler comp = proc.newXsltCompiler();
Xslt30Transformer t2 = comp.compile(new File("t2.xsl")).load30();
Destination phase2 = t2.asDocumentDestination(out);

Xslt30Transformer t1 = comp.compile(new File("t1.xsl").load30();
t1.applyTemplates(source, phase2);

A caveat: although the input document is streamed and the intermediate XML is streamed, the output JSON is (I think) constructed in memory in its entirely before being written out to the output file. That's a bit unfortunate, and we should try and fix it.

Upvotes: 2

Related Questions