NImesh
NImesh

Reputation:

Merging big files in C#

I have 7-8 xml files. Each one is approximately 50 MB in size. What is the best way to merge files programmatically in C# without getting System.OutOfMemory Exception? So far I have tried reading each file in a StringBuilder and than putting it in an array of string builder but I still get system.outofmemoery exception. Any help?? Thank you, -Nimesh

Upvotes: 1

Views: 2094

Answers (7)

Ron Savage
Ron Savage

Reputation: 11079

Personally, when I have to deal with XML files (forced by threat of physical violence usually), I do this:

  1. Load each file into a .NET DataSet via DataSet.ReadXML()
  2. Combine the information (via DataSet queries).
  3. Write out the combined DataSet to XML via DataSet.WriteXML()

Then I aggressively delete the orginal XML file and wipe the sectors where it existed on the disk to remove the taint. :-)

Upvotes: 1

pointernil
pointernil

Reputation: 608

Merge them within the file system by invoking "copy a.xml + b.xml" command or by invoking the windows filesystem APIs used by the "copy" command.

Upvotes: 0

Jon Skeet
Jon Skeet

Reputation: 1500425

The details of what you need to merge are indeed vital. However, to start you off: you're likely to want an XmlReader for each of the input files, and an XmlWriter for the output file. That will let you stream both the input and the output.

Another alternative would be to use XStreamingElement from LINQ to XML. I don't have any experience of this, but it may well be a simpler API to use. (The rest of LINQ to XML is certainly nicer than the DOM API.)

Upvotes: 3

Sunny Milenov
Sunny Milenov

Reputation: 22310

Please, define "merge".

If you want just to concatenate the files, then use StreamReader, and read line by line.

If you want actually to produce a new valid xml, then go with XmlTextReader. It does not read the whole file in memory.

Upvotes: 1

ZombieSheep
ZombieSheep

Reputation: 29953

Not sure what you mean by merge in this case. Do you mean simple concatenation of the files, or are you inspectng the content?

for example,

file1.xml

<items>
    <item id="1">
        <name>Widget</name>
    </item>
    <item id="2">
        <name>Widget 2</name>
    </item>
</items>

file2.xml

<items>
    <item id="3">
        <name>Widget</name>
    </item>
    <item id="4">
        <name>Widget 2</name>
    </item>
</items>

could be combined as

<items>
    <item id="1">
        <name>Widget</name>
    </item>
    <item id="2">
        <name>Widget 2</name>
    </item>
</items>
<items>
    <item id="3">
        <name>Widget</name>
    </item>
    <item id="4">
        <name>Widget 2</name>
    </item>
</items>

which is quite trivial, or as

<items>
    <item id="1">
        <name>Widget</name>
    </item>
    <item id="2">
        <name>Widget 2</name>
    </item>
    <item id="3">
        <name>Widget</name>
    </item>
    <item id="4">
        <name>Widget 2</name>
    </item>
</items>

Which is less so, given the amounts of data you are talking about. Which do you mean?

Upvotes: 0

Cade Roux
Cade Roux

Reputation: 89661

It depends what you mean by merge, since you haven't posted any information about the schema.

In the simplest case of homogeneous simple elements in a single collection, you would just merge directly to a new file on disk avoiding much in-memory work, ensuring that the outer containing elements are stripped off and added around the collection.

Upvotes: 0

Joel Coehoorn
Joel Coehoorn

Reputation: 415725

The thing about stringbuilder is you're still trying to keep the entire contents in memory. You want to only keep a small portion in memory at a time, and that means using filestreams. Don't read an entire file into memory, open a stream on it and keep reading from the stream.

The problem with xml is that you can't just append them to each other: you'll break the tag nesting. So you need to know something about the structure of your xml files so that you can have an idea of what to do at each file boundry.

If you have something that works in theory with StringBuilder, but only fails in practice because of memory constraints, you should be able to translate the StringBuilder's .Append() and .AppendLine() method calls into .Write() and .WriteLine() calls for a filestream.

Upvotes: 3

Related Questions