Stickman
Stickman

Reputation: 136

Very large XML file generation

I have a requirement to generate an XML file. This is easy-peasy in C#. The problem (aside from slow database query [separate problem]) is that the output file reaches 2GB easily. On top of that, the output XML is not in a format that can easily be done in SQL. Each parent element aggregates elements in its children and maintains a sequential unique identifier that spans the file. Example:

<level1Element>
    <recordIdentifier>1</recordIdentifier>
    <aggregateOfLevel2Children>11</aggregateOfL2Children>
    <level2Children>
        <level2Element>
        <recordIdentifier>2</recordIdentifier>
            <aggregateOfLevel3Children>92929</aggregateOfLevel3Children>
            <level3Children>
                <level3Element>
                    <recordIdentifier>3</recordIdentifier>
                    <level3Data>a</level3Data>
                </level3Element>
                <level3Element>
                    <recordIdentifier>4</recordIdentifier>
                    <level3Data>b</level3Data>
                </level3Element>
            </level3Children>
        </level2Element>
        <level2Element>
        <recordIdentifier>5</recordIdentifier>
            <aggregateOfLevel3Children>92929</aggregateOfLevel3Children>
            <level3Children>
                <level3Element>
                    <recordIdentifier>6</recordIdentifier>
                    <level3Data>h</level3Data>
                </level3Element>
                <level3Element>
                    <recordIdentifier>7</recordIdentifier>
                    <level3Data>e</level3Data>
                </level3Element>
            </level3Children>
        </level2Element>
    </level2Children>
</level1Element>

The schema in use actually goes up five levels. For the sake of brevity, I'm including only 3. I do not control this schema, nor can I request changes to it.

It's a simple, even trivial matter to aggregate all of this data in objects and serialize out to XML based on this schema. But when dealing with such large amounts of data, out of memory exceptions occur while using this strategy.

The strategy that is working for me is this: I'm populating a collection of entities through an ObjectContext that hits a view in a SQL Server database (a most ineffectively indexed database at that). I'm grouping this collection then iterating through, then grouping the next level then iterating through that until I get to the highest level element. I then organize the data into objects that reflect the schema (effectively just mapping) and setting the sequential recordIdentifier (I've considered doing this in SQL, but the amount of nested joins or CTEs would be ridiculous considering that the identifier spans the header elements into the child elements). I write a higher level element (say the level2Element) with its children to the output file. Once I'm done writing at this level, I move to the parent group and insert the header with the aggregated data and its identifier.

Does anyone have any thoughts concerning a better way output such a large XML file?

Upvotes: 2

Views: 252

Answers (1)

Alireza
Alireza

Reputation: 10476

As far as I understand your question, your problem is not with the limited space of storage i.e HDD. You have difficulty to maintain a large XDocument object in memory i.e RAM. To deal with this you can ignore make such a huge object. For each recovrdIdentifier element you can call .ToString() and get a string. Now, simply append this strings to a file. Put declaration and root tag in this file and you're done.

Upvotes: 1

Related Questions