Reputation: 19905
I need to process a bunch of very large XML files and read each element depth-first. Due to size, any DOM
solution is out of question and things are further complicated by the fact that the actual element needed is not the "leaf" but its parent.
More specifically, the files have a structure like
<Level 1>
...
<Level 2>
...
<Level N-1>
<value>...</value>
<value>...</value>
...
<value>...</value>
</Level N-1>
<Level N-1>
<value>...</value>
<value>...</value>
...
<value>...</value>
</Level N-1>
...
<Level N-1>
<value>...</value>
<value>...</value>
...
<value>...</value>
</Level N-1>
...
</Level 2>
</Level 1>
Out of each file like the above, the <Level N-1>
elements need to be read individually (each including all the corresponding <value>
elements). The depth, N
, varies within each file and across files, so it is essentially unknown, as are XML
tag names. Things are further complicated by the fact that <value>
elements also exist in higher levels (i.e., they constitute no guarantee that Level N
has been reached).
A quick solution for reading an entire XML element at a specific depth as a string is something like
int level = 0; // The base level of the element, could be at any depth
Reader in = ... // The reader to the input
ByteArrayOutputStream outStream = new ByteArrayOutputStream();
PrintStream out = new PrintStream(outStream);
XMLEventReader reader = XMLInputFactory.newInstance().createXMLEventReader(in);
XMLEventWriter writer = XMLOutputFactory.newInstance().createXMLEventWriter(out);
XMLEvent event;
while ((level > 0) && reader.hasNext());
{
event = reader.nextEvent();
if (event.isStartElement())
{
level++;
}
else if (event.isEndElement())
{
level--;
}
writer.add(event);
}
writer.flush();
String element = new String(outStream.toByteArray());
The above, however, is not helpful if the calling code does not know that a Level N-1
element has been reached and it advances to Level N
(i.e., to <value>
elements).
A SAX
solution would be ideal, but even preprocessing the file via an XSLT
template is acceptable.
Any ideas?
Upvotes: 2
Views: 963
Reputation: 41137
If I have understood your issue correctly, you're having difficulty distinguishing when you get to a <value>
tag and have finished going through the level tags
When you recognize an event, you can get further information like name out of it:
if (event.isStartElement()) {
StartElement element = (StartElement) event;
System.out.println("Start Element: " + element.getName());
}
If what you really want is the last level before this, of course you'll have to hold onto it.
Upvotes: 1
Reputation: 243529
The wanted XSLT pre-processing isn't possible in pure XSLT 1.0 or XSLT 2.0 because an XSLT processor (1.0 or 2.0) typically produces a representation (not necessarily DOM) of the whole XML document in memory.
In XSLT 3.0 (still a WD) there will be streaming as part of the language, but this is still under active development by the W3C XSLT WG and the specification isn't yet stable.
Saxon has streaming extensions in the form of streaming templates that are in a "streamable mode":
<xsl:mode name="s" streamable="yes"/>
using which it could be possible to produce XML documents each containing just the subtree rooted in an "Level N-1" element.
Upvotes: 3