Icebreaker
Icebreaker

Reputation: 297

Splitting a big XML file into smaller ones

I'm currently working on a project that requires me to split an XML. For example here is a sample:

<Lakes>
  <Lake>
    <id>1</id>
    <Name>Caspian</Name>
    <Type>Natyral</Type>
  </Lake>
  <Lake>
    <id>2</id>
    <Name>Moreo</Name>
    <Type>Glacial</Type>
  </Lake>
  <Lake>
    <id>3</id>
    <Name>Sina</Name>
    <Type>Artificial</Type>
  </Lake>
</Lakes>

Now in my java code ideally what would happen is it will split the XML into 3 small ones for this example and send each of them out using a messenger service. The code for the messenger service is not important. I have that done already.

So for example the code would run, split the first part into this:

<Lakes>
  <Lake>
    <id>1</id>
    <Name>Caspian</Name>
    <Type>Natyral</Type>
  </Lake>
</Lakes>

and then the java code would send this out in a message. It would then move on to the next part, send that out etc etc until it reaches the end of the big XML. This can be done through an XSLT or through java it doesn't matter. Any ideas?

To make it clear, I pretty much know how to break up a file using XSLT but I don't know how to break it up and send each part individually one at a time. I also don't want to store anything locally so they would ideally all get transferred into strings and sent out.

Upvotes: 2

Views: 1436

Answers (2)

GETah
GETah

Reputation: 21449

I would stream the XML (instead of building a DOM tree in memory) and cut the chunks out on the go. Whenever you meet a Lake tag, start copying the content into a buffer which you will send and reset when the final tag </Lake> is met.

EDIT Have a look at this link to know more about XML streaming in Java

Upvotes: 2

biziclop
biziclop

Reputation: 49804

If the way you have to chunk your files is fixed and known, the easiest solution is to use SAX or StAX to do it programmatically. I personally prefer StAX for this kind of task as the code is generally cleaner and easier to understand but SAX will do the job equally well.

XSLT is a great tool but its main drawback is that it can only produce one output. And apart from a few exceptions XSLT engines don't support streaming processing, so if the initial file is too big to fit in memory, you can't use them.

Update: In XSLT 2.0 <xsl:result-document> can be used to produce multiple output files, but if you want to get your chunks one by one and not store them in files, it's not ideal.

Upvotes: 3

Related Questions