Reputation: 58

I need to parse through a large XML file. Best practices?

I have a large XML file with the following structure.

<tree>
    <limb>
        <DATA0>
    </limb>
    <limb>
        <DATA1>
    </limb>
    <limb>
        <DATA2>
    </limb>
</tree>

There are several thousand limb elements, each with child elements. I need to parse through this file, and extract the limb elements in sets of 100 - 200 items, and create a new XML file from the data.

Is there a preferred method for performing this operation? I only know C# at an Novice/Intermediate level, and have worked for a while with XML files.

I am considering writing a loop that counts the total number of limb elements, performing a calculation to determine the number of new XML documents I will need (5000 limb elements / batches of 200 == 25 xmldocuments). From there I would need to read the first 200 sets, copy them into a new file, save it, and start again until the end of the file.

Does my logic seem flawed?

Upvotes: 1

Answers (4)

Michael Kay

Reputation: 163322

There might be an excuse to write this in C# if you were expert in C# and didn't have time to learn anything else, but since that isn't the case, XSLT is a much better tool for the job - especially XSLT 2.0, since that can produce multiple output files. (There are two XSLT 2.0 processors you can use in a C# environment - Saxon and XQSharp). It looks a very simple job in XSLT, something like:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">

<xsl:template match="/">
  <xsl:for-each-group select="//limb" group-adjacent="(position()-1) idiv 200">
    <xsl:result-document href="batch{position()}.xml">
      <batch>
        <xsl:copy-of select="current-group()"/>
      </batch>
    </xsl:result-document>
  </xsl:for-each-group>
</xsl:template>

</xsl:stylesheet>

Upvotes: 0

Chuck Savage

Reputation: 11955

Linq-To-XML as Robert linked would look like:

XElement xfile = XElement.Load(file);
var limbs = xfile.Elements("limb");
int count = limbs.Count();
var first200 = limbs.Take(200);
var next200 = limbs.Skip(200).Take(200);

Upvotes: 2

kristianp

Reputation: 5895

If the document is too large to load into memory, you can use XmlReader. You create your own subclass of XmlReader. Unless the file is greater than, say, 10-20% the size of your RAM, or you need it to be fast, it probably isn't worth the extra effort, though.

Upvotes: 2

Robert Groves

Reputation: 7738

Check out Linq-To-XML.

Upvotes: 1

I need to parse through a large XML file. Best practices?

Answers (4)

Related Questions