Very large XML file generation

Question

I have a requirement to generate an XML file. This is easy-peasy in C#. The problem (aside from slow database query [separate problem]) is that the output file reaches 2GB easily. On top of that, the output XML is not in a format that can easily be done in SQL. Each parent element aggregates elements in its children and maintains a sequential unique identifier that spans the file. Example:

The schema in use actually goes up five levels. For the sake of brevity, I'm including only 3. I do not control this schema, nor can I request changes to it.

It's a simple, even trivial matter to aggregate all of this data in objects and serialize out to XML based on this schema. But when dealing with such large amounts of data, out of memory exceptions occur while using this strategy.

The strategy that is working for me is this: I'm populating a collection of entities through an ObjectContext that hits a view in a SQL Server database (a most ineffectively indexed database at that). I'm grouping this collection then iterating through, then grouping the next level then iterating through that until I get to the highest level element. I then organize the data into objects that reflect the schema (effectively just mapping) and setting the sequential recordIdentifier (I've considered doing this in SQL, but the amount of nested joins or CTEs would be ridiculous considering that the identifier spans the header elements into the child elements). I write a higher level element (say the level2Element) with its children to the output file. Once I'm done writing at this level, I move to the parent group and insert the header with the aggregated data and its identifier.

Does anyone have any thoughts concerning a better way output such a large XML file?

Very large XML file generation

Answers (1)

Related Questions